Methods and Compositions Using Highly Conserved Pneumococcal Surface Proteins

ABSTRACT

Cold spot genes of S. pneumoniae are disclosed that encode surface proteins that are universally conserved among known strains and have exceptionally low incidence of allelic variation. Cold spot polypeptides encoded by the genes that are antigenic on the S. pneumoniae cells on which they are expressed are candidates for immunogenic compositions capable of eliciting antibodies able to react with all or nearly all strains of S. pneumoniae, thus providing an improvement over currently available S. pneumoniae vaccines that protect inoculated individuals against a maximum of about 23 of the 94 or so known serotypes of S. pneumonia.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority to U.S. Provisional Application No. 62/479,848, filed Mar. 31, 2017.

STATEMENT AS TO FEDERALLY SPONSORED RESEARCH

This invention was made with Government Support under Contract No. AI081687 awarded by the National Institutes of Health. The Government has certain rights in the invention.

FIELD OF THE INVENTION

This invention is generally in the field of infections by the bacterium Streptococcus pneumoniae. In particular, the invention provides methods and compositions that use one or more highly conserved genes encoding a surface protein of Streptococcus pneumoniae for diagnosing, treating, or preventing infection by Streptococcus pneumoniae. In particular embodiments, immunogenic compositions according to the invention include at least one in-common, highly conserved Streptococcus pneumoniae surface antigen or immunogenic fragment thereof, which may be used to elicit an immune response against a majority (i.e., >50%), advantageously a supermajority (i.e., at least 80%), and ideally all or nearly all (e.g., 90 or more) of the known capsular serotypes of Streptococcus pneumoniae, thus advancing modes of addressing Streptococcus pneumoniae infection toward a universal vaccine.

BACKGROUND OF THE INVENTION

Pneumonia is an ailment that can be lethal, particularly to the elderly and young. Every year, approximately 1.3 million children under the age of five die from pneumonia, mostly in the developing world. Infection by the bacterium, Streptococcus pneumoniae (also referred herein as “pneumococcus”), the most common cause of severe pneumonia, kills more than half a million children annually. While even healthy adults may be infected by S. pneumoniae, the elderly, immunocompromised individuals, and children are at greatest risk from the disease. Pneumococcus can also cause bacterial sepsis, bacterial sinusitis, and bacterial meningitis, and is the leading cause of middle ear infection (otitis media).

S. pneumoniae strains vary by region, and there are more than 94 capsular serotypes. While there has been some success in making pneumococcal vaccines, current pneumococcal vaccines do not protect against most pneumococcal capsular serotypes. For example, one currently marketed vaccine, PREVNAR-13® (Wyeth LLC, marketed by Pfizer, New York, N.Y.), is only partially efficacious, protecting against only 13 of the 94 or so pneumococcal capsular serotypes. Another marketed vaccine, PNEUMOVAX®-23 (Merck, Kenilworth, N.J.), offers no protection for children less than two years old, and no protection for older children and adults against 71 of the 94 or so pneumococcal capsular serotypes.

Accordingly, there has been an acute desire to find a “common protein” or “universal” vaccine that would be useful for immunizing against all or nearly all pneumococcal strains (currently ˜94 known), a supermajority, e.g., at least 80%, or at least a majority, e.g., greater than 50%, of the known pneumococcal strains. A critical barrier to the development of such a “common protein” vaccine has been pneumococcal antigenic diversity driven by host immunological selective pressure. The paradigm for development of a common protein vaccine has primarily been empirical, top-down, trial and error testing of known antigenic outer surface protein candidates. Such an approach has yielded poor results primarily because promiscuously recombinant bacteria like S. pneumoniae are adept at “immune escape”, based on the fact that the immunologically active regions of nearly all major surface proteins that have been examined mutate constantly. Copies of a mutated gene that provides an S. pneumoniae cell with a selective advantage for evading attack by an immune response will be rapidly transferred from the mutant cell to recipient cells owing to a promiscuous horizontal (lateral) gene transfer system with subsequent recombination of the mutated gene (recombinational mutation) into the genomes of the recipient cells.

Clearly, needs remain for compositions and methods for identifying, treating, or preventing infections by S. pneumoniae that are more resistant to the adaptive ability of S. pneumoniae cells to evade an immune response.

SUMMARY OF THE INVENTION

The present invention provides a particularized family of streptococcal surface antigens that are useful as immunogens to elicit an immune response recognizing all or nearly all S. pneumoniae strains. Polypeptide immunogens of the present invention therefore represent a very important advance in the search for a universal pneumonia vaccine. The immunogenic polypeptides of the present invention are serotype-independent and are characterized by their presence across virtually all genomic variants of S. pneumoniae and by their very low incidence of sequence variability from one allele to another across S. pneumoniae populations.

The candidate immunogens of the invention are referred to herein as cold spot polypeptides, which makes reference to the fact that the encoding genes for the polypeptides are located in recombinationally quiescent regions of the S. pneumoniae genome flanking the rRNA operons and show very low rates of mutation. Genes from such recombinationally quiescent loci are termed “cold spot genes”, and their expression products are referred to herein as “cold spot polypeptides”.

In a particular embodiment, the present invention provides an immunogenic composition comprising:

-   -   (a) at least one cold spot polypeptide or an immunogenic         fragment thereof, which polypeptide comprises at least a portion         of an extracellular domain of a S. pneumoniae surface protein,         wherein said protein:         -   (i) is in-common to all or nearly all known strains of S.             pneumoniae, and         -   (ii) has at least a 98% average amino acid sequence pairwise             homology among such known strains of S. pneumoniae;     -   (b) a pharmaceutically acceptable vehicle or excipient; and     -   (c) optionally, an adjuvant.         Preferred polypeptides disclosed herein have an average amino         acid sequence pairwise homology of at least 99%. Most preferred         polypeptides of the present invention have an average amino acid         sequence pairwise homology of at least 99.5%.

In particular embodiments, immunogenic cold spot polypeptides may be selected from the surface antigens set forth in Tables 1A and 1B, infra, having SEQ ID NOs:1-273. Of particular interest are extracellular regions of such cold spot polypeptides, as those regions correspond to naturally occurring surface targets on pneumococcal cells. Particular extracellular domains derived from the top six of 21 selected cold spot polypeptides from Table 1A are set forth in Table 3, infra, having SEQ ID NOs:336-341. Natural allelic variants of such polypeptides and extracellular portions thereof are equally useful immunogens for the compositions and uses disclosed herein.

The invention also provides methods of making immunogenic compositions comprising one or more cold spot polypeptides described herein for use in raising an immune response against a majority (i.e., >50%), more preferably a supermajority (i.e., at least 80%), and even more preferably all or nearly all (e.g., 90 or more) of the known serotypes of Streptococcus pneumoniae.

In a particular embodiment, the invention provides a method of making an immunogenic composition for raising an immune response producing antibodies reactive against at least 80 different serotypes of S. pneumoniae, said method comprising:

(1) selecting one or more cold spot surface antigens of S. pneumoniae,

(2) isolating one or more polypeptide segments from an extracellular domain of the one or more cold spot surface antigens selected in (1), and,

(3) formulating said one or more isolated polypeptide segments obtained in (2) by admixing said isolated polypeptide segments with a pharmaceutically acceptable carrier, to produce an immunogenic composition for inoculating human subjects against S. pneumonia infection.

In the above method of making an immunogenic composition according to the invention, the immunogenic composition may also include one or more adjuvants.

In another embodiment, in the method of making an immunogenic composition as described above, the immunogenic composition is effective for raising an immune response producing antibodies reactive with at least 90 S. pneumonia serotypes. More preferably, the immunogenic composition in the above method of making an immunogenic composition of the invention is effective for raising an immune response producing antibodies reactive with at least 91 S. pneumonia serotypes. Even more preferably, the immunogenic composition in the above method of making an immunogenic composition of the invention is effective for raising an immune response producing antibodies reactive with at least 93 S. pneumonia serotypes.

In another embodiment, step (1) of the above method of making an immunogenic composition according to the invention selects 1, 2, 3, 4, or 5 cold spot surface antigens. In another embodiment, step (1) of the above method of making an immunogenic composition according to the invention selects 1 or 2 cold spot surface antigens. In a further embodiment, step (1) of the above method of making an immunogenic composition according to the invention selects 1 cold spot surface antigen from a strain of S. pneumonia.

The present invention also relates to a method of eliciting an immune response in a mammal, such as a human, said method comprising administering to said mammal an immunogenic composition comprising:

-   -   (a) at least one cold spot polypeptide or an immunogenic         fragment thereof, which polypeptide comprises at least a portion         of an extracellular domain of a S. pneumoniae surface protein,         wherein said protein:         -   (i) is in-common to all or nearly all known strains of S.             pneumoniae, and         -   (ii) has at least a 98% average amino acid sequence pairwise             homology among such known strains of S. pneumoniae;     -   (b) a pharmaceutically acceptable vehicle or excipient; and     -   (c) optionally, an adjuvant.

In other embodiments, such a method of eliciting an immune response will utilize two or more such immunogenic compositions.

Also contemplated herein is a method of vaccinating a subject against S. pneumoniae infection, said method comprising administering at least one immunogenic composition according to the invention to a subject in an amount or for a number of administrations sufficient to elicit an immune response characterized by the presence of antibodies recognizing a majority of S. pneumoniae serotypes. Currently, there are 94 known serotypes of S. pneumoniae, with additional serotypes expected to evolve or to become recognized in the future. Accordingly, for the purposes of the present invention, “a majority of S. pneumoniae serotypes” will signify 50 or more serotypes. Recognizing that commercial pneumonia vaccines (e.g., PREVNAR®, PNEUMOVAX®) available today address only up to 23 of the known serotypes, a vaccine composition capable of addressing a majority of S. pneumoniae serotypes represents a significant advance in the field of vaccine development.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the estimation of the number of S. pneumoniae isolates necessary to analyze in order to be representative of species diversity. Two hundred thirty-one (231) fully sequenced S. pneumoniae genomes were analyzed and allelic variants compared for four cold spot proteins, MreD (SP_2217), MreC (SP_2218), HtrA (SP_2239) and an Fe ABC transporter (SP_1872), and for a more variable surface antigen, PspA (SP_0117). For each isolate (x axis), they value increases by 1 if the isolate represents an allelic variant not previously identified in the genome set. The resulting plots are depicted in dotted lines, with fitted non-linear regression curves. The corresponding goodness of fit is shown in parentheses. The model used for the non-linear regression is the one-phase exponential association function y=y_(max)(1−e^(−Kx)). As the data set increases and becomes more representative of species diversity, fewer new alleles appear, and the tangent to the regression curve tends toward a slope of 0, at which point one can estimate both the number of allelic variants likely to appear and the approximate number of isolates required in order to achieve this maximal representation of the diversity of the species. It is seen that for proteins as allelically invariant as cold spot proteins, the number of isolates necessary to analyze to be representative of S. pneumoniae diversity is between 100 and 200 isolates.

DETAILED DESCRIPTION OF THE INVENTION

This invention is based on the discovery of a family of S. pneumoniae genes that are highly conserved in identical or highly homologous forms throughout all or nearly all known S. pneumoniae strains. The genes of this family are examples of recombinational “cold spots” in the genome where very low levels of sequence variability are observed across the known S. pneumoniae strains. Owing to an exceptionally promiscuous system of horizontal DNA exchange across S. pneumonia cells, recombinational mutation frequently alters or deletes genes along approximately 74 percent of the S. pneumoniae (pneumococcal) genome. See, Croucher et al., Science, 331: 430-434 (2011). The approach taken for this invention was to identify pneumococcal genes that are specifically localized in recombinational “cold spot” regions of the pneumococcal genome, where mutation frequency is very low. Genes isolated from within such recombinationally quiescent regions are referred to herein as “cold spot” genes, and the polypeptides encoded by such “cold spot” genes are referred to as “cold spot polypeptides”. A “cold spot” gene is characterized by a lack of high sequence variability among its known alleles across all or nearly all strains of S. pneumoniae. Such a lack of variability in nucleotide sequence of a given cold spot gene from strain to strain is in turn reflected in a low variability in the amino acid sequence of the encoded cold spot polypeptide as expressed in all or nearly all strains of S. pneumoniae. Typically, and preferably, a cold spot polypeptide described herein has a greater than 98% average amino acid pairwise homology from strain to strain of S. pneumoniae (i.e., there is greater than 98% average amino acid pairwise homology between allelic forms of the cold spot polypeptide from strain to strain); in most cases, and more preferably, a cold spot polypeptide described has greater than 99% average amino acid pairwise homology from strain to strain of S. pneumoniae, and, in many cases, and even more preferably, a cold spot polypeptide described has a greater than 99.5% average amino acid pairwise homology from strain to strain of S. pneumoniae. In the most preferred embodiments, sixteen cold spot polypeptides are disclosed having a percent average amino acid pairwise homology from strain to strain of S. pneumoniae of 99.50% or higher. Such significantly low levels of variability in cold spot gene (nucleotide) sequences and in cold spot polypeptide (amino acid) sequences from strain to strain reflect the presence of an innate and extremely high selective pressure to maintain the existing “cold spot” gene sequences and their encoded polypeptide sequences across all or nearly all known pneumococcal strains and serotypes.

While not intending to be bound by any particular theory or mechanism, it is more likely that the resistance to sequence variability in the cold spot genes suggests that most recombinational events leading to mutations have lethal consequences either directly or by impacting adjacent genetic loci (e.g., rrn operons) that are essential for growth.

All of the cold spot genes disclosed herein encode polypeptides that are expressed on the surface of a majority or, indeed, of all or nearly all, e.g., >90%, of S. pneumoniae capsular serotypes. Cold spot polypeptides of the present invention will be expressed on the surface of at least 50, preferably at least 80, more preferably at least 90, and most preferably all of the 94 capsular serotypes currently known. Reflecting the high nucleotide sequence conservation of the cold spot genes, the corresponding amino acid sequences of the encoded S. pneumoniae cold spot polypeptides are also highly conserved across the known S. pneumoniae strains. Accordingly, cold spot polypeptides, extracellular domains of cold spot polypeptides, or immunogenic fragments of the cold spot extracellular domain, are particularly useful in immunogenic and vaccine compositions to elicit production of antibodies that will bind the polypeptides or their antigenic fragments that are expressed on the surface of all or nearly all S. pneumoniae strains. The isolated extracellular domain of a cold spot polypeptide is also referred to as the “cold spot extracellular domain”. Accordingly, a cold spot polypeptide, an isolated cold spot extracellular domain, or an immunogenic fragment comprising all or a portion the extracellular domain thereof, may be used as the basis of a “universal” or “capsular serotype-independent” vaccine that can elicit antibody that binds all or nearly all S. pneumoniae cells, independent of any S. pneumoniae cell's known or unknown capsular serotype classification. For the purposes of the present invention, a S. pneumoniae surface antigen polypeptide will be considered “capsular serotype-independent” if it is a portion of a surface protein common to all or nearly all serotypes of S. pneumoniae. Preferred such polypeptides will have an amino acid sequence variability from strain to strain that is so low that it exhibits an average amino acid pairwise homology among known serotypes of S. pneumoniae of greater than 98%, preferably greater than 99%, and most preferably greater than 99.5%. Immunization with such an in-common, highly conserved cold spot polypeptide according to the invention elicits production of antibodies capable of recognizing a supermajority, e.g., greater than 90% of known S. pneumoniae strains.

In order that the invention may be more fully understood, the following terms are defined.

Unless indicated otherwise, when the terms “about” and “approximately” are used in combination with an amount, number, or value, then that combination describes the recited amount, number, or value alone as well as the amount, number, or value plus or minus 10% of that amount, number, or value. By way of non-limiting example, the phrases “about 40%” and “approximately 40%” disclose both “40%” and “from 36% to 44%, inclusive”.

The terms “protein” and “polypeptide” are used interchangeably to refer to a polymer of amino acid residues connected by peptide bonding as used and understood in the fields of protein biochemistry and molecular biology. Typically, but not exclusively, the terms “protein” and “polypeptide” are used to refer to polymers of 20 or more amino acid residues linked by peptide bonds. The term “peptide”, like “protein” and “polypeptide”, also refers to a polymer of two or more amino acid residues linked by peptide bonding, but usually, although not exclusively, is used to describe a polymer of less than 20 amino acid residues.

The terms “antigenic determinant” and “epitope” are used synonymously and refer to the specific site on an antigen at which an antibody molecule binds. The antigenic determinant or epitope of an antigen is complementary to the antigen binding site of an antibody. An antigen may have only one or, as is usually the case, several or even many epitopes. Epitopes of a given antigen may be present as multiple copies of structurally identical moieties, as in the case of repetitive amino acid sequences in some proteins, or distinctly different, in which case each epitope could be bound by a different antibody. An epitope may be composed of a sequence of contiguous amino acid residues (linear epitope) along a polypeptide chain or non-sequential (nonlinear) amino acid residues from segments of a polypeptide chain (or even more than one polypeptide chain) that are brought together in a folded, three-dimensional conformation. A minimal sequential (linear) epitope of a polypeptide is typically five to eight amino acid residues in length (see, e.g., Kuby, Immunology, 2^(nd) ed., (W.H. Freeman and Company, New York, 1994), sentence bridging pp. 94-95; Watson et al., In Molecular Biology of the Gene, Fourth Edition (The Benjamin/Cummings Publishing Co., Inc., 1987), p. 836; Davis et al., In Microbiology, Second Edition (Harper & Row, Hagerstown, 1973). Epitopes of a polypeptide may be at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 contiguous amino acid residues. Epitopes of particular interest for the purposes of this invention are those located in the extracellular domain of a cold spot polypeptide, as such epitopes are expected to be exposed on the surface of S. pneumoniae cells and thus available for binding by an antibody.

The term “antigen”, as used herein, refers to any compound that is bound by an antibody, or antigen-binding fragment of the antibody, wherein the antibody was produced by an immune response elicited by an immunogenic composition comprising an epitope (antigenic determinant), which is contacted by the antigen binding site of the elicited antibody and which is also present on the antigen. The degree of specificity and affinity of an antibody to bind a cognate epitope of an intended (target) antigen is determined by the structural configuration and complementary determining regions (CDRs) of the antigen binding site of the elicited antibody that provides both spatial and non-covalent bonding features that are necessary to bind and retain the cognate epitope in the antigen-binding site, and thereby form an antibody-target antigen complex to the exclusion of other molecules that lack the structural features of the cognate epitope under the same environmental conditions. It is understood that some molecules may contain cross-reacting epitopes that approximate the features of the cognate epitope of the target antigen so that such molecules may also be bound by the antibody that was raised against the cognate epitope of the target antigen. Typically, however, antibody binding to a cross-reacting epitope is with a lower affinity as compared to the target antigen (or fragment thereof) containing the cognate epitope. Typically such cross-reactivity will be seen between only highly homologous antigens; of significance here is that BLAST comparison of the cold spot gene products described herein against all protein sequences in the NCBI GenBank databases found no homology other than to conserved proteins of other streptococcal strains.

The binding affinity of an antibody to a target antigen, or antigenic fragment thereof, comprising the cognate epitope can be readily determined using any of a number of methods available in the art including, but not limited to, enzyme linked immunosorbent assay (ELISA), surface plasmon resonance-based measurements (for example, using a Biacore® surface plasmon resonance instrument, Biacore® AB, Uppsala, Sweden), immuno-dot blot assay, Western (immuno) blotting, immuno-affinity chromatography, immunoprecipitation, flow cytometry, and fluorescence-activated cell sorting (FACS). More preferably, a method for determining affinity of an antibody to a target antigen, or antigenic fragment thereof, is determined using enzyme linked immunosorbent assay (ELISA), surface plasmon resonance-based measurements, flow cytometry, or FACS.

Preferred antigens of the invention include cold spot polypeptides encoded by cold spot genes, isolated cold spot extracellular domains of a cold spot polypeptide (cold spot extracellular domains), and immunogenic fragments of a cold spot extracellular domain. Preferred antibodies, or antigen-binding fragments thereof, bind to an epitope of the cold spot extracellular domain that is expressed on the surface of S. pneumoniae cells.

Unless indicated otherwise, the term “immunogen”, as used herein, refers to any compound or composition that elicits an immune response in a human or non-human subject that has been inoculated with the compound or composition, wherein the immune response includes production of antibody that binds the compound or a component of the composition. Such compounds and compositions are also described as “immunogenic”.

The term “immunogenic fragment” as used herein, with respect to a cold spot polypeptide, refers to a fragment of a cold spot polypeptide that is capable of eliciting an immune response when a mammalian subject is inoculated with the polypeptide, wherein the elicited immune response includes production of antibodies recognizing the cold spot polypeptide or an antigenic fragment of the cold spot polypeptide.

As used herein, the terms “recombinational cold spot” or simply “cold spot” refer to a gene in the genome of S. pneumoniae that is characterized by the absence of alleles or by a low incidence of sequence variability among known alleles of the gene across the 94 known serotypes of S. pneumoniae or the terms refer to the encoded polypeptide of such a gene. The low incidence of sequence variability means that the known alleles of a cold spot gene encode cold spot polypeptides that exhibit greater than 98% average pairwise amino acid sequence homology. In most cases, the allelic sequences of a cold spot gene useful in the invention encode polypeptides that show greater than 99% average pairwise amino acid sequence homology, and in many cases more than 99.5% average pairwise amino acid sequence homology across any particular population of pneumococcal isolate strains considered. Thus, the amino acid sequence of a polypeptide (or protein or peptide) encoded by a cold spot gene may be referred to as a cold spot amino acid sequence, and the encoded polypeptide may be referred to as a “cold spot polypeptide”. A preferred “cold spot polypeptide” useful in the invention is (1) encoded by a cold spot gene of S. pneumoniae, (2) is a polypeptide expressed on the surface of S. pneumoniae cells, (3) is in-common across all or nearly all strains of S. pneumoniae, meaning that the surface polypeptide or an allelic variant thereof is encoded in all or nearly all of the known S. pneumoniae genomes, (4) has an amino acid sequence showing greater than 98% average amino acid sequence pairwise homology across such known S. pneumoniae strains, and (5) is immunogenic, that is, capable of eliciting an immune response when administered to a mammal such as a human. Preferably, cold spot polypeptides according to the invention are suitably immunogenic so as to raise an antibody response that is reactive with whole S. pneumoniae cells (i.e., the cold spot protein is an immunogenic S. pneumoniae cell surface antigen). It will be understood from the context of a description herein whether the term “cold spot polypeptide” refers generally to the full-length surface antigen polypeptide encoded by a cold spot gene of a S. pneumoniae or to a portion thereof comprising the cold spot extracellular domain that is expressed on the surface of S. pneumoniae cells. The isolated extracellular domain(s) of a cold spot polypeptide may also be referred to as the “cold spot extracellular domain(s)”. Smaller portions of the extracellular domain are also contemplated, such as portions corresponding to the fragment encoded on a convenient restriction fragment found in the coding sequence, so long as the portions are still immunogenic.

As used herein, the term “all or nearly all S. pneumoniae strains” or “all or nearly all strains” means, with respect to a cold spot polypeptide as described herein, that the gene encoding the peptide is present in at least 95% of genomes in any collection of at least 123 Streptococcus pneumoniae strains. The cold spot gene products described in Table 1A (or alleles) were found to be in-common or universally present in all of the 231 Streptococcus pneumoniae full genomes available for analysis at the time the present discoveries were made. Most of the additional genes located in the recombinationally quiescent flanking regions adjacent the rrn operons of S. pneumoniae cited in Table 2 are present in at least 95% of the 123 or more genomic sequences studied to compile that listing. Additional genomes continue to be made available, but the inventor's non-linear regression analysis of cold spot proteins in relation to number of different strains revealed that analysis of at least 150 or so different strains would be accurately representative of the diversity of a particular genetic sequence over the population of genomes compared. Accordingly, for the purposes of the present invention, a gene or gene expression product found to be present in at least 95%, especially 98%, 99%, 99.5%, or up to 100% of strains in any population of at least 123 different Streptococcus pneumoniae strains will be considered universally present, or “in-common”, across the population, or present in “all or nearly all known strains” of Streptococcus pneumoniae. The term “all or nearly all known serotypes” as used in regard of different serotypes of Streptococcus pneumoniae, means at least 80 of the currently known Streptococcus pneumoniae serotypes. At the time of the present invention, 94 or so Streptococcus pneumoniae serotypes are known, but additional serotypes may be characterized and published in the future. To provide protection against as many pneumococcal serotypes as possible is a common desire of practitioners in the field of vaccines, and therefore, for the purposes of the present invention, in the context of eliciting immune responses, an immune response reactive with 80, 85, 90, 91, 92, 93, or 94 serotypes will be considered reactive with “all or nearly all known serotypes” of Streptococcus pneumoniae. In the same context, an immune response reactive with 50 or more serotypes will be considered reactive with “a majority of the known serotypes” of Streptococcus pneumoniae.

The terms “disorder” and “disease” are synonymous and refer to any pathological condition. Diseases of particular interest with respect to this invention are those caused by S. pneumoniae infection of particular tissues, organs or systems, such as pneumonia (lung infection), otitis media (middle ear infection), sinusitis (sinus infection), sepsis (bacteremia, bloodstream infection), and meningitis (infection of the meninges, which comprise three membranes covering the brain and spinal cord).

As used herein, the terms “treatment” and “treating” refer to any regimen that alleviates one or more symptoms or manifestations of a disease or disorder caused by S. pneumoniae, that inhibits progression of a disease or disorder caused by S. pneumoniae, that arrests progression or reverses progression (causes regression) of a disease or disorder caused by S. pneumoniae, or that prevents onset of a disease or disorder caused by S. pneumoniae. Treatment may include prophylaxis (preventative treatment) of one or more diseases caused by S. pneumoniae and includes but does not require cure of such a disease. Treatment may include active immunization resulting in an endogenous immune response that alleviates or combats S. pneumoniae infection or may include the use of exogenously prepared anti-cold spot polypeptide antibodies (for example, polyclonal antiserum or monoclonal antibodies) in passive immunization (immune therapy).

The terms “immunogen” and “immunogenic” refer to any compound or substance that is capable of eliciting an immune response in a human or non-human individual to the compound or substance.

A “vaccine composition,” or simply “vaccine,” has the definition commonly understood in the art and refers to an immunogenic composition, which upon administration to a human or non-human individual, elicits an initial (primary) immune response that produces antibodies that bind to an antigenic compound or substance in the vaccine composition, and wherein a secondary immune response is elicited later when the individual is again exposed to a source of the compound or substance. For example, a vaccine designed to provide an individual with a protective immune response (“protective immunity”, see below) against an infectious agent, such as a pathogenic virus or bacterium, may comprise inactivated or avirulent virus particles, killed bacterial cells, or one or more antigenic polypeptides, polysaccharides, or other components of the pathogenic virus or bacterium, such that a primary immune response elicited to the viral or bacterial components in the vaccine not only produces a first population of antibodies that bind to the viral or bacterial components of the vaccine but also produces memory immune cells that will trigger a secondary immune response against the corresponding virus or bacterium that infects the individual at a later time. Preferably, a secondary immune response is more pronounced than the primary response to the vaccine and is elicited every time the individual is subsequently exposed to an infecting virus or bacterium that expresses or presents the viral or bacterial component(s) present in the vaccine composition. To maintain the ability to elicit a robust secondary immune response, an individual may be administered the vaccine again at a later time as a “boost” (or “booster shot”) that stimulates production of additional memory immune cells in the individual. For example, a commonly known vaccine that is periodically followed by one or more boosts (booster shoots) to maintain the ability of an individual to elicit a robust secondary immune response, is any of the tetanus vaccines currently in use to provide protection from infection by the bacterium Clostridium tetani.

The term “isolated” when used to describe the various nucleic acid and polypeptide molecules disclosed herein, means a nucleic acid or polypeptide that has been identified and separated and/or recovered from the nucleic acid or polypeptide's natural environment. Contaminant components of the natural environment are materials that would typically interfere with a diagnostic, therapeutic, or prophylactic use of a nucleic acid or polypeptide, and may include enzymes, hormones, carbohydrates, lipids, and other proteinaceous or non-proteinaceous molecules. An isolated polypeptide may include a polypeptide in situ within recombinant cells engineered to express it, since at least one component of the polypeptide's natural environment will not be present. Ordinarily, however, an isolated polypeptide will be prepared by at least one purification step. An “isolated nucleic acid”, “isolated polynucleotide”, or an “isolated polypeptide-encoding nucleic acid” is a nucleic acid molecule that is identified and separated from at least one contaminant nucleic acid molecule with which it is ordinarily associated in a natural source of such nucleic acid, e.g., the Streptococcus pneumoniae genome. An isolated nucleic acid (or isolated polynucleotide or isolated polypeptide-encoding nucleic acid) is other than in the form or setting in which it is found in nature. An isolated nucleic acid (or isolated polynucleotide or isolated polypeptide-encoding nucleic acid) therefore is distinguished from the specific polypeptide-encoding nucleic acid molecule as it exists in natural cells. An isolated nucleic acid (or isolated polynucleotide) includes polypeptide-encoding nucleic acid molecules contained in cells that ordinarily express the polypeptide but where, for example, the nucleic acid molecule is in a different location (e.g., a plasmid) from that of natural cells (e.g., the bacterial chromosome).

The term “isolated” or “isolate” when used to describe a cell means that the cell and its genetically uniform sister cells produced by cell division have been taken out of or separated from an environment comprising at least one other, genetically diverse cell, which may be a prokaryotic or eukaryotic cell. For example, cells of a Streptococcus pneumoniae capsular serotype are typically maintained in stock cultures that are generated by single colony purification on an agar plate, wherein a single colony is assumed to contain genetically uniform sister cells of the particular serotype. An “isolate” of S. pneumoniae may also refer to a cell and its genetically uniform sister cells that were purified as a single colony from a human, clinical sample (such as from sputum, bronchial lavage, blood, tissue, organic material) and that have been maintained and passed on as a stock culture.

The term “polynucleotide” as referred to herein means a polymeric form of two or more nucleotides, either ribonucleotides (RNA) or deoxyribonucleotides (DNA), or a modified form of either type of nucleotide. The term includes single and double stranded forms of RNA and DNA, but preferably is double-stranded DNA.

The term “isolated polynucleotide” as used herein shall mean a polynucleotide (e.g., of genomic, cDNA, or synthetic origin, or some combination thereof) that is not associated with all or a portion of a polynucleotide with which the “isolated polynucleotide” is found in nature or is operably linked to a polynucleotide that it is not linked to in nature.

The term “vector”, as used herein, is intended to refer to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of vector is a plasmid, which refers to a circular double stranded DNA molecule into which additional DNA segments may be inserted. Another type of vector is a viral vector wherein additional DNA segments may be inserted into the viral genome. Of particular interest are bacteriophage vectors, used to transduce bacterial host cells. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) can be integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operatively linked. Such vectors are referred to herein as “recombinant expression vectors” or simply, “expression vectors”. In general, expression vectors of utility in recombinant DNA techniques are often in the form of plasmids. In the present specification, “plasmid” and “vector” may be used interchangeably as the plasmid is the most commonly used form of vector. However, the invention is intended to include such other forms of expression vectors, such as viral vectors (e.g., replication defective retroviruses, adenoviruses and adeno-associated viruses), which serve equivalent functions.

The term “operably linked” refers to a juxtaposition wherein the components described are in a relationship permitting them to function in their intended manner. A nucleic acid control sequence “operably linked” to a nucleic acid coding sequence is ligated in such a way that expression of the coding sequence is achieved under conditions compatible with the control sequence. “Operably linked” sequences include both expression control sequences that are contiguous with a gene of interest and expression control sequences that act in trans or at a distance to control the gene of interest. The term “expression control sequence” as used herein refers to polynucleotide sequences that are necessary to effect the expression of coding sequences to which they are ligated. For prokaryotic or eukaryotic host cells, expression control sequences will include appropriate transcription initiation, termination, and promoter sequences. If eukaryotic hosts are employed, other control elements that may be used include enhancer sequences; efficient RNA processing signals, such as splicing and polyadenylation signals; sequences that stabilize cytoplasmic mRNA; sequences that enhance translation efficiency (such as, a Kozak sequence); sequences that enhance protein stability; and, when desired, sequences that enhance protein insertion into or translocation across one or more cell membranes. The term “control sequence” is intended to include components whose presence is essential for gene transcription and translation and processing of expressed gene product, and can also include additional components whose presence is advantageous, for example, leader sequences and fusion partner sequences.

The term “recombinant host cell” or simply “host cell”, as used herein, is intended to refer to a cell into which exogenous nucleic acid has been or can be introduced. It will be understood that such terms are intended to refer not only to the particular subject cell, but also to the progeny of such a cell. Host cells include prokaryotic and eukaryotic cells. Suitable prokaryotic host cells useful in the invention include, but are not limited to, Escherichia coli, Bacillus (for example, B. subtilis), Streptomyces (for example, S. lividans), Salmonella, and Pseudomonas. Eukaryotic cells that may be used as host cells in the present invention include protist cells, fungal cells, insect cells, plant cells, and animal cells. Preferably, an animal host cell is a mammalian cell of an established cell line, such as, but not limited to, CHO, HEK 293, COS, NS0, and SP2 cell lines. A preferred CHO cell carries a dhfr− mutation (as described in Urlaub and Chasin, Proc. Natl. Acad. Sci. USA, 77: 4216-4220 (1980)) and is used with a DHFR selectable marker (such as described in Kaufman and Sharp, J. Mol. Biol., 159: 601-621 (1982)). A preferred insect host cell is a cell of the Sf9 cell line. Preferred fungal host cells are yeast cells, such as Saccharomyces (for example, S. cerevisiae), Kluveromyces (for example, K. lactis), Schizosaccharomyces (for example, S. pombe), and Picha (for example, P. pastoris). A particularly preferred yeast host cell is Saccharomyces cerevisiae. The preferred host for prokaryotic expression of polypeptides according to this invention is E. coli. The use of a Streptococcus (for example, S. pneumoniae) strain as a host is possible but not preferred, as it would require separation of the recombinantly expressed cold spot polypeptide from other streptococcal proteins.

The term “physiologically acceptable” as used in reference to a compound or composition means that the compound or composition is compatible with the physiology of a living subject, e.g., a human or a non-human animal, to which the compound or composition is to be administered and also is not deleterious to the cold spot polypeptide, or antigenic fragment thereof, or to a desired property or activity of any other component that may be present in a composition. Any compound or composition used in standard methods to raise polyclonal or monoclonal antibodies is also understood to be sterile and used under sterile conditions to avoid introduction of infectious, undesirable, and/or interfering substances into the subject and/or cells thereof. A physiologically acceptable compound or composition used in producing polyclonal or monoclonal antibodies must of course be compatible with the established protocols. Nevertheless, vehicles (carriers), compounds, and compositions that are acceptable in protocols for raising polyclonal or monoclonal antibodies in non-human animals are not necessarily approved and “pharmaceutically acceptable” for use in humans. “Pharmaceutically acceptable” refers to that which is not only physiologically acceptable but also of the quality and type that is permissible for medical use in treating humans.

A composition or method described herein as “comprising” one or more named elements or steps is open-ended, meaning that the named elements or steps are essential, but other elements or steps may be added within the scope of the composition or method. To avoid prolixity, it is also understood that any composition or method described herein as “comprising” (or “which comprises”) one or more named elements or steps also describes the corresponding, more limited, composition or method “consisting essentially of” (or “which consists essentially of”) the same named elements or steps, meaning that the composition or method includes the named essential elements or steps and may also include additional elements or steps that do not materially affect the basic and novel characteristic(s) of the composition or method. It is also understood that any composition or method described herein as “comprising” or “consisting essentially of” one or more named elements or steps also describes the corresponding, more limited, and close-ended composition or method “consisting of” (or “which consists of”) the named elements or steps to the exclusion of any other unnamed element or step. In any composition or method disclosed herein, known or disclosed equivalents of any named essential element or step may be substituted for that element or step.

Other terms are defined in the text below or, unless indicated otherwise, the meaning of other terms used herein is the same as understood and used by persons in the corresponding field, including, but not limited to, the fields of microbiology, population and evolutionary biology, immunology, genetics, biochemistry, molecular biology, and medicine.

A Family of Cold Spot Genes of S. Pneumoniae

The approach taken for this invention was to identify pneumococcal genes (encoding polypeptides) that reside in a portion of the S. pneumoniae genome having very low mutational frequency. In the present case, the regions flanking (e.g., roughly 50 kb upstream and 50 kb downstream of) ribosomal RNA (rRNA) operons of the Streptococcus pneumoniae genome were analyzed. The pneumococcal genome contains four copies of the essential rRNA operons (rrn operons). See, Tettelin et al., Science, 293: 498-506 (2001). It was proposed that genes located in these flanking regions would not exhibit high recombinational mutation frequency, i.e., the “cold spot” or recombinationally quiescent regions of the genome would prove to be highly resistant to incorporation and perpetuation of horizontally exchanged DNA (see, for example, the rationale of Bouchet et al., Clin. Microbiol. Rev., 21(2): 262-273 (2008) in studies of multiple microbial species). Recombination of any of the rrn operons is nearly always a fatal event, and therefore strains evolving mutations or deletions in proximity to the rrn operons do not survive to propagate the genomic changes. Cold spot regions perpetuate genetic conservation because recombination-based mutation nearly always proves lethal, thus purging the vast majority of such recombinational mutants from the bacterial population.

According to the invention, cell surface-exposed cold spot polypeptides encoded by cold spot genes from such recombinationally quiescent regions are highly if not universally conserved across pneumococcal strains. According to the invention, surface antigens encoded by cold spot genes are likely to be particularly well-suited as universal or at least capsular serotype-independent protective antigens common to a majority up to all capsular serotypes of S. pneumoniae. Immunogenic compositions comprising such cold spot polypeptides or fragments thereof comprising all or a portion of the cold spot extracellular domain expressed on the surface of S. pneumoniae cells are useful for eliciting production of antibodies that bind to all or nearly all S. pneumoniae strains. Preferably, such elicited antibodies bind to 90 percent or more of all capsular serotypes. There are about 94 known S. pneumoniae capsular serotypes at the present time, and available capsular polysaccharide-oriented vaccines aim at eliciting an antibody response to a maximum of 23 different serotypes thus far. Because the immunogenic S. pneumoniae surface antigens of the present invention elicit antibodies recognizing less variant bacterial surface structures, it is possible to address many more of the capsular serotypes of S. pneumoniae in a single cold spot polypeptide-containing vaccine. Preferred immunogenic compositions according to the invention will raise antibodies recognizing 50 or more of the capsular serotypes, which represents a significant advance over available vaccines; moreover, since the present compositions comprise capsule-independent immunogens, it is contemplated that the elicited immune response to compositions of the invention will preferably recognize 80 or more, 85 or more, 90 or more, or 94 (e.g., all) capsular serotypes of S. pneumoniae. If additional serotypes are isolated, it is contemplated that immunogenic compositions of the present invention will elicit antibodies that recognize those new serotypes as well.

Three intensive, genome-scale, subcellular localization protein predictor programs were employed to analyze the approximately 50 kilobase pair region of each of the eight flanking regions of the four rrn operons in the S. pneumoniae genome: Gpos-mPloc, described by Shen et al., Protein Pept. Lett., 16(12): 1478-1484 (2009); PSORTdb described by Yu et al., Nucleic Acids Res., 39 (database issue): D241-D244 (2011); and LocateP described by Zhou et al., BMC Bioinformatics, 9:173 (2008). The analysis using these programs identified 104 genes located within the approximately 400 kilobase pairs that flank the four rrn operons present in the S. pneumoniae genome that putatively encode surface expressed polypeptides. These 104 genes represented ˜5% of the roughly 400 kb of genomic sequences analyzed.

The 104 genes from the above analysis were further assessed (by study of the scientific literature pertaining to all 104 proteins and their cognates in other organisms) with respect to biological function of the encoded polypeptide to generate a family of 79 genes that are most likely to encode surface-expressed polypeptides that are vital (essential) to the survival of the cell. The 50 proteins were ranked by likelihood of being essential to cell survival, and the proteins deemed most vital for cell survival were further analyzed by amino acid sequence to identify numbers of alleles and the degree of amino acid homology across the library of 231 S. pneumoniae genomes that represented all known and fully sequenced strains contained in the NCBI GenBank database at the time. The predominant amino acid sequences for the 21 most vital cold spot polypeptides are given in Table 1A, below. In Table 1B, which follows Table 1A, information on the allelic variants of the predominant sequence for each of the 21 cold spot polypeptides is given, and the percent average amino acid sequence pairwise homology across the allelic variants is also given. The GBSP# is an internal laboratory designation for each of the proteins.

TABLE 1A Polypeptides Encoded by Cold Spot Genes Polypeptide Designation/ Literature Acronym/ Protein Amino Acid Sequence (length) Function (SEQ ID NO:) GBSP3 shape- MNRFKKSKYV IIVFVTVLLV SALLATTYSS TIVTKLGDGI MreC determining SLVDRVVQKP FQWFDSVKSD LAHLTRTYNE NESLKKQLYQ (272 aa) protein LEVKSNEVES LKTENEQLRQ LLDMKSKLQA TKTLAADVIM RSPVSWKQEL TLDAGRSKGA SENMLAIANG GLIGSVSKVE ENSTIVNLLT NTENADKISV KIQHGSTTIY GIIIGYDKEN DVLKISQLNS NSDISAGDKV TTGGLGNFNV ADIPVGEVVA TTHSTDYLTR EVTVKLSADT HNVDVIELVG NS (SEQ ID NO: 1) GBSP14 sensor histidine MKRYLQFWLV NLSVSLILIA GMALTWISKG IGLFLLALSL PnpS kinase GLGGYWLFCL WKWEVAFETL HQPLLTSSEY FLEKGQEDLK (443 aa) SLAQYVSGLK TKVSQQDQQY KDLAETMEVL LSHLTMGTFL VSAQGQMLLS SRSLPHYFPD VDGDISSLDD LKRMDIRNLV HQAFDQKTRL KQEVSGFHEG DLILEVTAVP VFSPTQSVEA VLVLLYDLTT IRTYEKLNLA FVSNASHELR TPVTSIKGFA ETIKGMSAEE EALKDDFLDI IYKESLRLEH IVEHLLTLSK AQQMPIQWTT LSLAEFVQDL TQSLQPQLKK KDLQLKVQVP DDVTLVSDSQ LLSQILLNLL SNAIRYTEQG GKIEVKTQKV NEGIKISVSD TGIGISQLEQ DRIFERFYRV NKGRSRQTGG TGLGLAIVKE LSQLLGGQVT VTSQLGRGSC FTIFLPNQSF AQD (SEQ ID NO: 2) GBSP15 type II secretory MILLEAVVAL AIFASIATLL LGQIQKNRQE EAKILQKEEV unannotated pathway LRVAKMALQT GQNQVSINGV EIQVFSSEKG LEVYHGSEQL protein of PulG pseudopilin LAIKEP (SEQ ID NO: 3) superfamily (86 aa) GBSP22 ABC-type MDSMILGRYI PGDSIVHRLD PRSKLLAMML LILIVFWANN CbiQ cobalt transport PLTNLILFIA TGIFIALSGV SLSFFIQGLK SMFFLIAFTT (264 aa) system IFQLFFISNG NVLFEFSFVR ITDYALQQAG IIFCRFVLII permease FFSTLLTLTT MPLSLASAVE ALLAPLKRVK VPVHEIGLML component SMSLRFVPTL MDDTTRIMNA QKARGVDFGE GSIVQKVKAM IPILIPLFAT SLKRADSLAI AMEARGYQGG KGRSQYRQLK WTLKDTLTIL VILVLGCCLF FLKS (SEQ ID NO: 4) GBSP23 serine protease MKHLKTFYKK WFQLLVVIVI SFFSGALGSF SITQLTQKSS HtrA VNNSNNNSTI TQTAYKNENS TTQAVNKVKD AVVSVITYSA (393 aa) NRQNSVFGND DTDTDSQRIS SEGSGVIYKK NDKEAYIVTN NHVINGASKV DIRLSDGTKV PGEIVGADTF SDIAVVKISS EKVTTVAEFG DSSKLTVGET AIAIGSPLGS EYANTVTQGI VSSLNRNVSL KSEDGQAIST KAIQTDTAIN PGNSGGPLIN IQGQVIGITS SKIATNGGTS VEGLGFAIPA NDAINIIEQL EKNGKVTRPA LGIQMVNLSN VSTSDIRRLN IPSNVTSGVV VRSVQSNMPA NGHLEKYDVI TKVDDKEIAS STDLQSALYN HSIGDTIKIT YYRNGKEETT SIKLNKSSGD LES (SEQ (ID NO: 5) GBSP24 putative sensor MDLFGFGTVI VHFLIISHSY HFICKGQINR KELFVFGAYT ComD histidine kinase LLTEIVFDFP LYILYLDGLG IERFLFPLGL YSYFRWMKQY (441 aa) ERDRGLFLSL LLSLLYESTH NFLSVTFSSI TGDNFVLQYH FPFFFVVTVL TYFVTLKIIY YFHLELAYFD EDYLYPFLKK VFFALLLLHI VSFVSDMVST IKHLNSFGSI LSSIVFISLL LTFFAMNSHK VQMEKEIALK QKKFEQKHLQ NYTDEIVGLY NEIRGFRHDY AGMLVSMQMA IDSGNLQEID RIYNEVLVKA NHKLRSDKYT YFDLNNIEDS ALRSLVAQSI VYARNNGVEF TLEVKDTITK LPIELLDLVR IMSVLLNNAV EGSADSYKKQ MEVAVIKMET ETVIVIQNSC KMTMTPSGDL FALGFSTKGR NRGVGLNNVK ELLDKYNNII LETEMEGSTF RQIIRFKREF E (SEQ ID NO: 6) GBSP1 preprotein MRFIGDIFRL LKDTTWPTRK ESWRDFRSIM EYTAFFVVII SecE translocase YIFDQLIVSG LIRFINIF (SEQ ID NO: 7) (58 aa) subunit E GBSP2 magnesium ion MKGVTNMTPE EMYLTERLDV QIAHFLKKSV QHRRRYKVLK unannotated transporter ITEIVAGFLI AVFCAIPMPG DRYRLISVAL SSLGLLCEGI protein of INLYNAKENW ISYQKTAQLL EKEKFLYQCQ TEKYAGKTKA CorA family FALFVKTCEG LISEEINQWE SIQSKEVAAS ADAPVKKE (158 aa) (SEQ ID NO: 8) GBSP4 permease MKGVNMEKQQ PSKAALLSII PGLGQIYNKQ KAKGFIFLGV MalC protein of TIVFVLYFLA LATPELSNLI TLGDKPGRDN SLFMLIRGAF (435 aa) maltodextrin HLIFVIVYVL FYFSNIKDAH TIAKRINNGI PVPRTLKDMI ABC transporter KGIYENGFPY LLIIPSYVAM TFAIIFPVIV TLMIAFTNYD FQHLPPNKLL DWVGLTNFTN IWSLSTFRSA FGSVLSWTII WALAASTLQI VIGIFTAIIA NQPFIKGKRI FGVIFLLPWA VPAFITILTF SNMFNDSVGA INTQVLPILA KFLPFLDGAL IPWKTDPTWT KIALIMMQGW LGFPYIYVLT LGILQSIPND LYEAAYIDGA NAWQKFRNIT FPMILAVAAP TLISQYTFNF NNFSIMYLFN GGGPGSVGGG AGSTDILISW IYRLTTGTSP QYSMAAAVTL IISIIVISIS MIAFKKLHAF DMEDV (SEQ ID NO: 9) GBSP6 rod shape- MRQLKRVGVF LLLPFFVLID AHISQLLGSF FPHVHLASHF MreD determining LFLFLLFETI EVSEYLYLVY CFVIGLVYDV YFFHLIGITT (164 aa) protein LLFILLGAFL HKLNSVILLN RWTRMLAMIV LTFLFEMGSY LLAFMVGLTV DSMSIFIVYS LVPTMILNFL WITVFQFIFE KYYL (SEQ ID NO: 10) GBSP7 phospho- MTNSNYKLTK EDFNQINKRS LFTFQLGWNY ERMQASGYLY unannotated transferase MILPQLRKMY GDGTPELKEM MKVHTQFFNT SPFFHTIIAG analogue of system (PTS) FDLAMEEKDG VGSKDAVNGI KTGLMGPFAP LGDTIFGSLV PTS subunit transporter PAIMGSVAAT MAIAGQPWGI FLWIAVAVAY DIFRWKQLEF IID AYKEGVNLIN NMQSTLTALI DAASVLGVFM MGALVATVIN (271 aa) FEISYKLPIG EKMIDFQDIL NQIFPRLLPA IFTAFIFWLL GKKGMNSTKA IGIIIVLALA LSALGHFALG M (SEQ ID NO: 11) GBSP8 permease MSTIDKEKFQ FVKRDDFASE TIDAPAYSYW KSVFKQFMKK AmiD protein of KSTVVMLGIL VAIILISFIY PMFSKFDFND VSKVNDFSVR (308 aa) oligopeptide YIKPNAEHWF GTDSNGKSLF DGVWFGARNS ILISVIATVI ABC transporter NLVIGVFVGG IWGISKSVDR VMMEVYNVIS NIPPLLIVIV LTYSIGAGFW NLIFAMSVTT WIGIAFMIRV QILRYRDLEY NLASRTLGTP TLKIVAKNIM PQLVSVIVTT MTQMLPSFIS YEAFLSFFGL GLPITVPSLG RLISDYSQNV TTNAYLFWIP LTTLVLVSLS LFVVGQNLAD ASDPRTHR (SEQ ID NO: 12) GBSP9 phospho- MIQWWQILLL TLYSAYQICD ELTIVSSAGS PVFAGFITGL unannotated transferase IMGDVTTGLL IGGNLQLFVL GVGTFGGASR IDATSGAVLA analogue of system (PTS) TAFSVSQGID APLAITTIAV PVAALLTYFD VLGRMTTTFF PTS subunit transporter AHRVDAAIER FDYKGIERNY LLGAIPWALS RALPVFFALA IID FGGAFVQSVV DFVEAYKWVA DGLTLAGRML PGLGFAILLR (301 aa) TLPVKRNLHY LAMGFGLTAM LTVLYSYVTG LGGAVAGIVG TLPAEVAEKI GFVNNFKGLS MIGISIVGIF LAVLHFKNSQ KVAVAAPSTP SESGEIEDDE F (SEQ ID NO: 13) GBSP10 nicotinamide MMHTYLQKKI ENIKTTLGEM SGGYRRMVAA MADLGFSGTM unannotated mononucleotide KAIWDDLFAH RSFAQWIYLL VLGSFPLWLE LVYEHRIVDW analogue of transporter IGMICSLTGI ICVIFVSEGR ASNYLFGLIN SVIYLILALQ PnuC analogue KGFYGEVLTT LYFTVMQPIG LLVWIYQAQF KKEKQEFVAR nicotinamide KLDGKGWTKY LSISVLWWLA FGFIYQSIGA NRPYRDSITD mononucleo- ATNGVGQILM TAVYREQWIF WAATNVFSIY LWWGESLQIQ tide transporter GKYLIYLINS LVGWYQWSKA AKQNTDLLN (SEQ ID (269 aa) NO: 14) GBSP11 protein involved MKKKLKLTSL LGLSLLIMTA CATNGVTSDI TAESADFWSK unannotated in membrane LVYFFAEIIR FLSFDISIGV GIILFTVLIR TVLLPVFQVQ analogue of insertion MVASRKMQEA QPRIKALREQ YPGRDMESRT KLEQEMRKVF SpoIIIJ family KEMGVRQSDS LWPILIQMPV ILALFQALSR VDFLKTGHFL (274 aa) WINLGSVDTT LVLPILAAVF TFLSTWLSNK ALSERNGATT AMMYGIPVLI FIFAVYAPGG VALYWTVSNA YQVLQTYFLN NPFKIIAERE AVVQAQKDLE NRKRKAKKKA QKTK (SEQ ID NO: 15) GBSP13 phosphatidyl MKKEQIPNLL TIGRILFIPI FIFILTIGNS IESHIVAAII PgsA transferase FAVASITDYL DGYLARKWNV VSNFGKFADP MADKLLVMSA (181 aa) FIMLIELGMA PAWIVAVIIC RELAVTGLRL LLVETGGTIL AAAMPGKIKT FSQMFAIIFL LLHWTLLGQV LLYVALFFTI YSGYDYFKGS AYVFKGTFGS K (SEQ ID NO: 16) GBSP17 permease MIHLIMISAI ALAIGIGYRT KINIGLLAIA FSYLIATTLM ArsB GLSPKELLHF WPTSLFFTIF SVSLFYNVAT TNGTLDVLAQ (436 aa) HILYRTRTHP NALYMILYLM ATLLSALGAG FFTTMAVCCP LAITLCQKAD KHPLIGAQAV NWGASGGANL ITSSSGIVFQ GLFKQMGWEE QAFSLGNHIF IVSIIYPLIV LLLLSCYSHY SKGRTNSSLT IDQPPLLSKV QRQTTLLMIS SMVLVWLFPL LHLIFPNIAW IATYQKTFDI GFVSILMVCL ALRLKLGKQE AILAKVPWAT IIMLCGMSLL MSLAVKSGLV TLIGHLMTTT IPHFWLPLFF CVIAGVMSLF SSTLSVVAPA LFPIIAIISA QNPQIDIHLL TTATVIGALS TNISPFSSAG SLIQLSLPNI EERGLAFKKQ IILGVPISLS LGLLTTWILI LLASLS (SEQ ID NO: 17) GBSP18 multi- MGENFLQMLM GMVDSYLVAH LGLIAISGVS VAGNIITIYQ MatE antimicrobial AIFIALGAAI SSVISKSIGQ KDQSKLAYHV TEALKITLLL (411 aa) extrusion SFLLGFLSIF AGKEMIGLLG TERDVAESGG LYLSLVGGSI protein (MATE) VLLGLMTSLG ALIRATHNPR LPLYVSFLSN ALNILFSSLA IFVLDMGIAG VAWGTIVSRL VGLVILWSQL KLPYGKPTFG LDKELLTLAL PAAGERLMMR AGDVVIIALV VSFGTEAVAG NAIGEVLTQF NYMPAFGVAT ATVMLLARAV GEDDWKRVAS LSKQTFWLSL FLMLPLSFSI YVLGVPLTHL YTTDSLAVEA SVLVTLFSLL GTPMTTGTVI YTAVWQGLGN ARLPFYATSI GMWCIRIGTG YLMGIVLGWG LPGIWAGSLL DNGFRWLFLR YRYQRYMSLK G (SEQ ID NO: 18) GBSP19 (function MNYPKIDLKT IRQESKHFQA DTPRLFLLYI LPSMLVILSG unannotated unknown) FLNPLSRIHG TVLEQPFFSI LGQILQTYLF PLLVSFIGTI open reading LLTSSVYATL TLMKDSKTEP SVKNSLALFD EERFSQTFLT frame LLLKRFYLFL WSIPNLLGIY LLFYSSFLAK KFVTLHPEFP (285 aa) NLDLSSVETE RFLMVFGLYF LASLILIIVG NILYIPQYYA YSQVEFLLCY SLDLGQVPPR RILKTSRSFM KGYKFQHFVL DLQLLPWYFL NWITFGIASF SLLPYIQCTK IMFYRAVLAR KRPKA (SEQ ID NO: 19) GBSP20 cation efflux MRNMKAKYAV WVAFFLNLTY AIVEFIAGGV FGSSAVLADS CzcD system protein VHDLGDAIAI GISAFLETIS NREEDNQYTL GYKRFSLLGA (299 aa) LVTAVILVTG SVLVILENVT KILHPQPVND EGILWLGIIA ITINLLASLV VGKGKTKNES ILSLHFLEDT LGWVAVILMA IVLRFTDWYI LDPLLSLVIS FFILSKALPR FWSTLKIFLD AVPEGLDIKQ VKSGLERLDN VASLNQLNLW TMDALEKNAI VHVCLKEMEH METCKESIRI FLKDCGFQNI TIEIDADLET HQTHKRKVCD LERSYEHQH (SEQ ID NO: 20) GBSP21 competence MKPEFLESAE FYNRRYHNFS SSVIVPMALL LVFLLGFATV ComB factor transport AEKEMSLSTR ATVEPSRILA NIQSTSNNRI LVNHLEENKL (449 aa) protein VKKGDLLVQY QEGAEGVQAE SYASQLDMLK DQKKQLEYLQ KSLQEGENHF PEEDKFGYQA TFRDYISQAG SLRASTSQQN ETIASQNAAA SQTQAEIGNL ISQTEAKIRD YQTAKSAIET GASLAGQNLA YSLYQSYKSQ GEENPQTKVQ AVAQVEAQIS QLESSLATYR VQYAGSGTQQ AYASGLSSQL ESLKSQHLAK VGQELTLLAQ KILEAESGKK VQGNLLDKGK VTASEDGVLH LNPETSDSSM VAEGALLAQL YPSLEREGKA KLTAYLSSKY VARIKVGDSV RYTTTHDAGN QLFLDSTITS IDATATKTEK GNFFKIEAET NLTSEQAEKL RYGVEGRLQM ITGKKSYLRY YLDQFLNKE (SEQ ID NO: 21)

TABLE 1B Allelic Variants of the Cold Spot Polypeptides (Predominant Sequences) in Table 1A number of alleles found/ Cold Spot % average aa Polypeptide pairwise presented in homology (to Amino Acid Variations for Each Allele by Table 1A predominant Position in the Predominant Polypeptide (SEQ ID NO.) sequence) Sequence in Table 1A GBSP3 allelic 6 alleles found 1. I205V (SEQ ID NO: 22) variants from 99.89% average 2. I32V (SEQ ID NO: 23) predominant aa pairwise 3. A231V (SEQ ID NO: 24) sequence (SEQ homology 4. T67A, L78I, V88A, D103G, A134V, R136K, S186T, ID NO: 1) I194V, A216T (SEQ ID NO: 25) 5. S141F (SEQ ID NO: 26) 6. T187I (SEQ ID NO: 27) GBSP14 allelic 14 alleles found 1. L42Q (SEQ ID NO: 28) variants from 99.76% average 2. L42Q, G88A (SEQ ID NO: 29) predominant aa pairwise 3. Q84K (SEQ ID NO: 30) sequence (SEQ homology 4. L42Q, V109I (SEQ ID NO: 31) ID NO: 2) 5. L38F (SEQ ID NO: 32) 6. V360G (SEQ ID NO: 33) 7. L42Q, G88A, I271S (SEQ ID NO: 34) 8. E58K (SEQ ID NO: 35) 9. L275H (SEQ ID NO: 36) 10. L42Q, V326I (SEQ ID NO: 37) 11. T213I (SEQ ID NO: 38) 12. A23V, L42Q, G88A (SEQ ID NO: 39) 13. Q84K, S235L (SEQ ID NO: 40) 14. L42Q, G88A, A343T (SEQ ID NO: 41) GBSP15 allelic 10 alleles found 1. S56N (SEQ ID NO: 42) variants from 99.33% average 2. N53S (SEQ ID NO: 43) predominant aa pairwise 3. M1V (SEQ ID NO: 44) sequence (SEQ homology 4. T18P (SEQ ID NO: 45) ID NO: 3) 5. M1V, N53S (SEQ ID NO: 46) 6. M1V, S56N (SEQ ID NO: 47) 7. A44T, S56N (SEQ ID NO: 48) 8. S56N, E72A (SEQ ID NO: 49) 9. S56N, P86L (SEQ ID NO: 50) 10. V55I, S56N (SEQ ID NO: 51) GBSP22 allelic 5 alleles found 1. H154Y (SEQ ID NO: 52) variants from 99.94% average 2. S234I (SEQ ID NO: 53) predominant aa pairwise 3. A199S (SEQ ID NO: 54) sequence (SEQ homology 4. M162I (SEQ ID NO: 55) ID NO: 4) 5. L62F (SEQ ID NO: 56) GBSP23 allelic 12 alleles found 1. V320I (SEQ ID NO: 57) variants from 99.62% average 2. K211R (SEQ ID NO: 58) predominant aa pairwise 3. W11S, V320I (SEQ ID NO: 59) sequence (SEQ homology 4. E143K, V320I (SEQ ID NO: 60) ID NO: 5) 5. W11G, T33A, N42S, D91E, R98Q, V301I, N385D (SEQ ID NO: 61) 6. Y8N, S302I, V320I, Q355P (SEQ ID NO: 62) 7. K211R, N283K, G284R (SEQ ID NO: 63) 8. K211R, T227A (SEQ ID NO: 64) 9. K211R, A290V (SEQ ID NO: 65) 10. K211R, V320I (SEQ ID NO: 66) 11. S186N (SEQ ID NO: 67) 12. W11G, T33A, N42S, D91E, R98Q, A147T, V301I, N385D (SEQ ID NO: 68) GBSP24 allelic 23 alleles found 1. F4L, H21R, F22L, Q27R, F34Y, F47L, D48E, variants from 98.37% average P50S, L51F, I53L, G58K, L59I, T106I, E151K predominant aa pairwise (SEQ ID NO: 69) sequence (SEQ homology 2. F13L, S19N, F22L, E62A, R63I, E151K, ID NO: 6) A392S (SEQ ID NO: 70) 3. F4L, H21R, F22L, Q27R, F34Y, V35I, F47L, D48E, P50S, I53L, G58K, L59I, E151K, I374M (SEQ ID NO: 71) 4. F4L, H21R, F22L, Q27R, F34Y, F47L, D48E, P50S, L51F, I53L, G58K, L59I, T106I, E151K, M384I (SEQ ID NO: 72) 5. F22L, L51F, E62A, R63T, M77I, S104P, E151K (SEQ ID NO: 73) 6. G264S (SEQ ID NO: 74) 7. F22L, E62A, R63T, M77I, E151K (SEQ ID NO: 75) 8. F4L, H21R, F22L, Q27R, F34Y, F47L, D48E, P50S, L51F, I53L, G58K, L59I, E151K, I374M (SEQ ID NO: 76) 9. Y73N (SEQ ID NO: 77) 10. F13L, S19N, F22L, E62A, R63I, E151K (SEQ ID NO: 78) 11. S19N, F22L, E62A, R63T, M77I, E151K, A392S (SEQ ID NO: 79) 12. F4L, H21R, F22L, Q27R, F34Y, F47L, D48E, P50S, L51F, I53L, G58K, L59I (SEQ ID NO: 80) 13. L3F, G5L, F6L, G7V, T8D, V9L, V11L, H12Y, H21R, F22L, G26D, V35F, F47L, L51F, I53L, E62A, R63T, H120Y, F121G, P122L, E151K, I272V, E439K (SEQ ID NO: 81) 14. A164V (SEQ ID NO: 82) 15. M253T (SEQ ID NO: 83) 16. A280X, I374V (SEQ ID NO: 84) 17. S104Y (SEQ ID NO: 85) 18. G7E, S104Y (SEQ ID NO: 86) 19. S19N, F22L, E62A, R63T, M77I, E151K, E371G, A392S (SEQ ID NO: 87) 20. F22L, E62A, R63T, M77I, E151K, I272V (SEQ ID NO: 88) 21. F22L, M77I, T99I, E151K, V171I, L254F (SEQ ID NO: 89) 22. F4L, H21R, F22L, Q27R, F34Y, F47L, D48E, P50S, L51F, I53L, Y55S, G58K, L59I, E151K (SEQ ID NO: 90) 23. L3F, G5L, F6L, G7V, T8D, V9L, V11L, H12Y, H21R, F22L, G26D, V35F, F47L, L51F, I53L, E62A, R63T, H120Y, F121G, P122L, I272V, E439K (SEQ ID NO: 91) GBSP1 allelic 1 allele found 1. R9K (SEQ ID NO: 92) variant from 99.91% average predominant aa pairwise sequence (SEQ homology ID NO: 7) GBSP2 allelic 3 alleles found 1. N89H, K102R, D152E, V155G (SEQ ID variants from 99.90% average NO: 93 ) predominant aa pairwise 2. T127A (SEQ ID NO: 94) sequence (SEQ homology 3. V20I (SEQ ID NO: 95) ID NO: 8) GBSP4 allelic 8 alleles found 1. I422V (SEQ ID NO: 96) variants from 99.92% average 2. G39S (SEQ ID NO: 97) predominant aa pairwise 3. D64N (SEQ ID NO: 98) sequence (SEQ homology 4. G174S (SEQ ID NO: 99) ID NO: 9) 5. A332T (SEQ ID NO: 100) 6. G39S, W201L (SEQ ID NO: 101) 7. N5I, T53A, I86V, S94A, T179S, L269F (SEQ ID NO: 102) 8. N5I, T53A, I86V, R114L, S193A, G371A (SEQ ID NO: 103) GBSP6 allelic 5 alleles found 1. V70I (SEQ ID NO: 104) variants from 99.69% average 2. Y68H (SEQ ID NO: 105) predominant aa pairwise 3. V67I (SEQ ID NO: 106) sequence (SEQ homology 4. I64T (SEQ ID NO: 107) ID NO: 10) 5. T79A, M105I, L111M, T112S, E116D, S119T, L122F, F124L, M125V, V130L, M133L, S134P, T144S, F149L, L150V, I152M, T153L, V154I, K161R (SEQ ID NO: 108) GBSP7 allelic 7 alleles found 1. G117A (SEQ ID NO: 109) variants from 99.88% average 2. A260T (SEQ ID NO: 110) predominant aa pairwise 3. S20R (SEQ ID NO: 111) sequence (SEQ homology 4. V198M, L269F (SEQ ID NO: 112) ID NO: 11) 5. A263T (SEQ ID NO: 113) 6. M49I, L207W, P208A, L264F, H266K, A268V (SEQ ID NO: 114) 7. S93A, N167S, V198L, I203V, Q222S, L264F, H266K, A268V, −272G, −273A (SEQ ID NO: 115) GBSP8 allelic 5 alleles found 1. V79I (SEQ ID NO: 116) variants from 99.78% average 2. L122V (SEQ ID NO: 117) predominant aa pairwise 3. A216V (SEQ ID NO: 118) sequence (SEQ homology 4. V79I, G106V (SEQ ID NO: 119) ID NO: 12) 5. V79I, G168R (SEQ ID NO: 120) GBSP9 allelic 10 alleles found 1. E248Q (SEQ ID NO: 121) variants from 99.86% average 2. D181Y (SEQ ID NO: 122) predominant aa pairwise 3. A283V (SEQ ID NO: 123) sequence (SEQ homology 4. F131Y, S168H, F172L, E174T, A175E, ID NO: 13) K177Q, V179I (SEQ ID NO: 124) 5. I41V (SEQ ID NO: 125) 6. T471 (SEQ ID NO: 126) 7. F131Y (SEQ ID NO: 127) 8. A164E, S168G, E174T, A175E, K177Q (SEQ ID NO: 128) 9. A91T, P92A, F172L, I238V, V239I, A244K, E245D (SEQ ID NO: 129) 10. A91T, P92A, A164E, S168H, E174T, A175K, K177Q, A244K, E245D (SEQ ID NO: 130) GBSP10 allelic 9 alleles found 1. G164D (SEQ ID NO: 131) variants from 99.84% average 2. L268P (SEQ ID NO: 132) predominant aa pairwise 3. Q186R (SEQ ID NO: 133) sequence (SEQ homology 4. G38E (SEQ ID NO: 134) ID NO: 14) 5. A212T (SEQ ID NO: 135) 6. T168A (SEQ ID NO: 136) 7. W80C (SEQ ID NO: 137) 8. Q149R (SEQ ID NO: 138) 9. M2I (SEQ ID NO: 139) GBSP11 allelic 12 alleles found 1. G25E, H158Q (SEQ ID NO: 140) variants from 99.81% average 2. E47K (SEQ ID NO: 141) predominant aa pairwise 3. V67I (SEQ ID NO: 142) sequence (SEQ homology 4. S166G (SEQ ID NO: 143) ID NO: 15) 5. L41S ( SEQ ID NO: 144) 6. E47K, R70C (SEQ ID NO: 145) 7. E47K, A255T (SEQ ID NO: 146) 8. A222T, V253A (SEQ ID NO: 147) 9. F211L (SEQ ID NO: 148) 10. P102S (SEQ ID NO: 149) 11. T19A (SEQ ID NO: 150) 12. G25E, H158Q, F243L (SEQ ID NO: 151) GBSP13 allelic 8 alleles found 1. M83V (SEQ ID NO: 152) variants from 99.61% average 2. M83I (SEQ ID NO: 153) predominant aa pairwise 3. I22L (SEQ ID NO: 154) sequence (SEQ homology 4. A122S (SEQ ID NO: 155) ID NO: 16) 5. M124I (SEQ ID NO: 156) 6. L84F (SEQ ID NO: 157) 7. A90D (SEQ ID NO: 158) 8. A122S, G169S (SEQ ID NO: 159) GBSP17 allelic 22 alleles found 1. Q255R (SEQ ID NO: 160) variants from 99.51% average 2. V68F, T70I, H89Y, Q255R, T320S (SEQ ID predominant aa pairwise NO: 161) sequence (SEQ homology 3. V68F, T70I, H89Y, E170D, T320S (SEQ ID ID NO: 17) NO: 162) 4. A92T (SEQ ID NO: 163) 5. T70I, Q255R (SEQ ID NO: 164) 6. V68F, T70I, H89Y, R204K, T320S (SEQ ID NO: 165) 7. V68F, T70I, H89Y, T320S (SEQ ID NO: 166) 8. G108S (SEQ ID NO: 167) 9. H89Y (SEQ ID NO: 168) 10. Q255R, H315Q (SEQ ID NO: 169) 11. T37I, T70I, H89Y (SEQ ID NO: 170) 12. T70I, H89Y (SEQ ID NO: 171) 13. V68F, T70I, H89Y (SEQ ID NO: 172) 14. T70I, H89Y, Q255R, T319I, T320S (SEQ ID NO: 173) 15. T70I, H89Y, Q255R, T320S (SEQ ID NO: 174) 16. R273K (SEQ ID NO: 175) 17. A116V (SEQ ID NO: 176) 18. A249V (SEQ ID NO: 177) 19. Y18F, T113A (SEQ ID NO: 178) 20. T70I (SEQ ID NO: 179) 21. T70I, H89Y, Q255R, T320S, G422E (SEQ ID NO: 180) 22. V68F, T70I, T86I, H89Y, Q255R, T320S, G404A (SEQ ID NO: 181) GBSP18 allelic 21 alleles found 1. V177L, M373L (SEQ ID NO: 182) variants from 99.50% average 2. V177L, S408N (SEQ ID NO: 183) predominant aa pairwise 3. A169T, V177L, P197L, V344I, M373L (SEQ sequence (SEQ homology ID NO: 184) ID NO: 18) 4. V177L (SEQ ID NO: 185) 5. V177L, S408I (SEQ ID NO: 186) 6. A169T, V177L, P197L, S408I (SEQ ID NO: 187) 7. S408N (SEQ ID NO: 188) 8. V177L, P197L (SEQ ID NO: 189) 9. S408I (SEQ ID NO: 190) 10. P142L (SEQ ID NO: 191) 11. A237T, S408N (SEQ ID NO: 192) 12. V177L, G195E (SEQ ID NO: 193) 13. G195E (SEQ ID NO: 194) 14. V177L, P197L, T260M (SEQ ID NO: 195) 15. L159X (SEQ ID NO: 196) 16. A317V (SEQ ID NO: 197) 17. K203R, S408I (SEQ ID NO: 198) 18. V177L, S297Y, M373L (SEQ ID NO: 199) 19. A169T, V177L, P197L, P295L, S408I (SEQ ID NO: 200) 20. Q7H, A169T, V177L, T285I, S408I (SEQ ID NO: 201) 21. Q7H, A169T, G170S, T285I, S408I (SEQ ID NO: 202) GBSP19 allelic 17 alleles found 1. G40S, T67A (SEQ ID NO: 203) variants from 99.37% average 2. G40S, T67A, H237R (SEQ ID NO: 204) predominant aa pairwise 3. T67A, Y210H (SEQ ID NO: 205) sequence (SEQ homology 4. G40S, H237R (SEQ ID NO: 206) ID NO: 19) 5. G40S, T67A, V239I (SEQ ID NO: 207) 6. G40S (SEQ ID NO: 208) 7. G40S, T67A, Y140H, N161S (SEQ ID NO: 209) 8. G40S, T67A, R275Q (SEQ ID NO: 210) 9. G40S, T67A, S183G, H237R (SEQ ID NO: 211) 10. G40S, T67A, S75F, H237R (SEQ ID NO: 212) 11. G40S, T67A, S75F, K150Q, E170D, H237R (SEQ ID NO: 213) 12. L245I (SEQ ID NO: 214) 13. G78R (SEQ ID NO: 215) 14. T67A, V153A, Y210H (SEQ ID NO: 216) 15. R12H, L27F, G40S (SEQ ID NO: 217) 16. T67A, H237R (SEQ ID NO: 218) 17. T67A, H237R, I271L (SEQ ID NO: 219) GBSP20 allelic 20 alleles found 1. A47V, P189S (SEQ ID NO: 220) variants from 99.31% average 2. A47V, L81V, G214D (SEQ ID NO: 221) predominant aa pairwise 3. T58L, A222V (SEQ ID NO: 222) sequence (SEQ homology 4. V30I, G45V, S143N, P189S (SEQ ID ID NO: 20) NO: 223) 5. P189S (SEQ ID NO: 224) 6. S143N, P189S (SEQ ID NO: 225) 7. I52M, H250Y (SEQ ID NO: 226) 8. T83I, P189S (SEQ ID NO: 227) 9. V30I, G45V, A222V (SEQ ID NO: 228) 10. A222V (SEQ ID NO: 229) 11. A47V, A222V (SEQ ID NO: 230) 12. V10A, G45V (SEQ ID NO: 231) 13. A47V (SEQ ID NO: 232) 14. A187V (SEQ ID NO: 233) 15. V30I, G45V, L81V, V131I, A222V (SEQ ID NO: 234) 16. V30I, G45V, L81V, A222V (SEQ ID NO: 235) 17. V30I, G45V, L125M (SEQ ID NO: 236) 18. V30I, A222V, I260N (SEQ ID NO: 237) 19. V10A, A21T, A47V, A222V (SEQ ID NO: 238) 20. A47V, L81V, G214D, G266V (SEQ ID NO: 239) GBSP21 allelic 34 alleles found 1. A202T, V311I, Y360D (SEQ ID NO: 240) variants from 99.13% average 2. I198M, G206S, V311I, Y360D (SEQ ID predominant aa pairwise NO: 241) sequence (SEQ homology 3. S22R, A28S, V40F, M45I, T227I, ID NO: 21) V311I, Y360D (SEQ ID NO: 242) 4. S182N, I198M, G206S, V311I, Y360D (SEQ ID NO: 243) 5. V53I, I198M, G206S, G257A, V311I, Y360D (SEQ ID NO: 244) 6. I198M, G206S, V311I, Y360D (SEQ ID NO: 245) 7. E125G, V311I, A335T, Y360D (SEQ ID NO: 246 ) 8. V311I, Y360D (SEQ ID NO: 247) 9. D85E, A94V, A149E, A194V, V311I, Y360D (SEQ ID NO: 248) 10. T227I, V311I, Y360D (SEQ ID NO: 249) 11. A94T, I198M, G206S, V311I, Y360D (SEQ ID NO: 250) 12. A175S, V311I, Y360D, L382F (SEQ ID NO: 251) 13. A194V, V311I, Y360D, T410I (SEQ ID NO: 252) 14. A202T, T286S, V311I, A335T, Y360D (SEQ ID NO: 253) 15. A175S, V311I, Y360D (SEQ ID NO: 254) 16. H129R, S182N, K195Q, I198M, G206S, V311I, Y360D (SEQ ID NO: 255) 17. G84E, S182N, I198M, G206S, V311I, Y360D (SEQ ID NO: 256) 18. I198M, G206S, K300N, V311I, Y360D (SEQ ID NO: 257) 19. A202T, G282C, V311I, Y360D (SEQ ID NO: 258) 20. L213F, V311I, Y360D (SEQ ID NO: 259) 21. A202T (SEQ ID NO: 260) 22. S182N, K195Q, I198M, G206S, V311I, Y360D (SEQ ID NO: 261) 23. I198M, G206S, V311I, A350T, Y360D (SEQ ID NO: 262) 24. A175S, I198M, G206S, V311I, Y360D (SEQ ID NO: 263) 25. V40I, A202T, G206S, V311I, Y360D (SEQ ID NO: 264) 26. A202T, V311I, Y360D, L382F (SEQ ID NO: 265) 27. E125G, I198M, V311I, Y360D (SEQ ID NO: 266) 28. A175S, V311I, Y360D (SEQ ID NO: 267) 29. Y138H, L152I, A175S, V311I, Y360D (SEQ ID NO: 268) 30. A175S, V311I, Y360D, T410I, L428I (SEQ ID NO: 269) 31. A175S, L213F, A231V, V311I, Y360D (SEQ ID NO: 270) 32. L75S (SEQ ID NO: 271) 33. A149V, A194V, V311I, Y360D, T410I (SEQ ID NO: 272) 34. A194V, V311I, Y360D (SEQ ID NO: 273)

Alignment and analysis of genetic sequence data from isolates of S. pneumoniae present in the publicly accessible National Center for Biotechnology Information (NCBI) GenBank database (National Center for Biotechnology Information (NCBI), U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda Md., 20894, USA) was performed using the MEGABLAST® program for aligning nucleotide sequences (Altschul et al., J. Mol. Biol., 215(3): 403-410 (1990); Zhang et al., J. Comput. Biol., 7 (1/2): 203-214 (2000)) and BLASTP for aligning amino acid sequences (Altschul et al. (1990)) to determine the degree of conservation across the genomic sequences. Eventually 231 complete S. pneumoniae genomes were analyzed. This analysis involved aligning over 10 million nucleotide sequences and over 3 million corresponding amino acid sequences. Where possible, corresponding cold spot proteins from non-fully sequenced pneumococcal genomes in the Microbial Genomes database of NCBI were also included in these alignment studies. Thus, for many of the disclosed cold spot polypeptides, sequence homology data included more than the 231 related sequences from the fully sequenced pneumococcal genomes.

By this alignment analysis, 16 putative surface-expressed antigens encoded by cold spot genes were identified that had an average amino acid pairwise sequence homology of 99.50% or greater. A group of the top 21 cold spot genes most likely to be essential by locus criticality (e.g., proximity to essential rrn operons, see, for example, Bouchet et al., Clin. Microbiol. Rev., 21: 2262-2273 (2008)) and/or function, encoding polypeptides corresponding to SEQ ID NOS:1-21 in Table 1A (and their alleles, SEQ ID NOs:22-273 in Table 1B), were selected for further characterization. Using dot blot DNA hybridization, the top 21 cold spot genes were also subjected to a survey for species-wide commonality (presence) across the evolutionary diversity of a worldwide phylogenetically organized collection of more than 2500 pneumococcal isolates from North America, South America, Europe, Africa, South Asia, and China. The results confirmed that the 21 genes were common to (“present in”, “in-common” among) the more than 2500 S. pneumoniae isolates in the inventor's laboratory collection of isolates, indicating that all 21 of the S. pneumoniae proteins were encoded in the genome of every known strain of S. pneumoniae—in other words, such cold spot genes were universally present in the species.

Production of Cold Spot Polypeptides and Fragments Thereof

Cold spot polypeptides, and fragments thereof, of the present invention may be produced by any of a number of techniques known in the art including recombinant genetic engineering, chemical synthesis, and cell-free translation.

For example, a cold spot polypeptide, or immunogenic fragment thereof, can be produced using methods of conventional recombinant nucleic acid technology to express a cold spot gene or portion thereof in cultured cells. See, for example, Sambrook et al., eds., Molecular Cloning: a Laboratory Manual (3d ed.) (Cold Spring Harbor Press, 2001); Ausubel et al. (eds.), Current Protocols in Molecular Biology (John Wiley & Sons, Inc., New York, 1994); Innis et al. (eds.), PCR Protocols (Academic Press, New York, 1990). For example, a nucleic acid molecule encoding a cold spot polypeptide, or fragment thereof, may be inserted into an expression vector, such that the open reading frame is properly oriented for the expression of the encoded protein under the control of a promoter of choice that is compatible with a particular eukaryotic or prokaryotic host cell. In addition to a promoter, an expression vector should also contain the necessary elements for the transcription of the inserted nucleic acid encoding the desired polypeptide.

By way of non-limiting example, for expression of a cold spot polypeptide or fragment thereof in host cells, an expression vector encoding a cold spot polypeptide, or desired fragment thereof, can be introduced into a host cell by standard techniques. A wide variety of techniques may be used for the introduction of exogenous DNA into a prokaryotic or eukaryotic host cell, including, but not limited to, electroporation (for example, Neumann et al., EMBO J., 1(7):841-845 (1982); Wong et al., Biochem. Biophys. Res. Commun., 107(2):584-587 (1982); Potter et al., Proc. Natl. Acad. Sci. USA, 81(22):7161-7165 (1984), incorporated herein by reference), calcium-phosphate precipitation, DEAE-dextran transfection, lipofection, protoplast fusion, particle bombardment, polyethylene glycol-mediated DNA uptake (see, for example, Sambrook et al. (eds.), Molecular Cloning: A Laboratory Manual, second edition (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989), which is hereby incorporated by reference in its entirety), and fusion of protoplasts with other entities (for example, minicells, other cells, liposomes, or other fusible lipid-surfaced bodies that contain the desired cold spot gene or coding sequence). See, for example, Fraley et al., Proc. Natl. Acad. Sci. USA, 79(6): 1859-1863 (1982), which is hereby incorporated by reference in its entirety. Such methods of introduction of DNA into a target cell will be variously referred to in the art as, e.g., “transformation”, “conjugation”, “transduction”, “transfection”, etc., and all such methods and techniques are contemplated and may be appropriately used according to the desires of the practitioner.

Using standard methods, such as described above, a nucleic acid molecule encoding a cold spot polypeptide, or fragment thereof (for example, an immunogenic fragment), can be cloned into an expression vector, which in turn is introduced into a host cell for expression of the encoded cold spot polypeptide or fragment thereof. A variety of expression systems comprising suitable expression vectors and host cells are available in the art for expressing a recombinant nucleic acid encoding a cold spot polypeptide or fragment thereof. Examples of suitable host cells include, but are not limited to, bacterial cells, fungal cells, mammalian cells, insect cells, plant cells, and protist cells. For expression of large quantities of cold spot polypeptides, prokaryotic hosts will typically be employed. Suitable prokaryotic host cells useful in the invention include, but are not limited to, Escherichia coli, Bacillus (for example, B. subtilis), Streptomyces (for example, S. lividans), Salmonella, and Pseudomonas. Escherichia coli is preferred. The use of a Streptococcus (for example, S. pneumoniae) strain as a host is possible but not preferred, as it would require separation of the recombinantly expressed cold spot polypeptide from other streptococcal proteins.

Under certain circumstances, expression in a eukaryotic host cell may be favored. Preferred mammalian host cells for expressing a recombinant cold spot polypeptide or fragment thereof of the invention include, but are not limited to, Chinese Hamster Ovary (CHO cells) (including dhfr− CHO cells, described in Urlaub and Chasin, Proc. Natl. Acad. Sci. USA, 77:4216-4220 (1980)) preferably used with a DHFR selectable marker as described in Kaufman and Sharp, J. Mol. Biol., 159: 601-621 (1982)), NS0 myeloma cells, COS cells, and SP2 cells.

By way of non-limiting example, when a recombinant expression vector carrying an inserted S. pneumoniae cold spot gene is introduced into a host cell, the cold spot polypeptide is produced by culturing the host cell for a period of time sufficient to allow for expression of the cold spot polypeptide in the host cells. Cold spot polypeptides can be recovered from the cultured host cells using standard protein purification methods and then formulated into immunogenic compositions or vaccines following standard methods and protocols in the art.

As should be clear from the above comments, a variety of methods are available in the art for the skilled practitioner to express cold spot polypeptides or fragments thereof. Alternatively, in view of the full disclosure herein of the amino acid sequence encoded by each cold spot gene (Table 1A and Table 1B), a cold spot polypeptide or fragment thereof may also be produced to order from a contract research organization. Similarly, polyclonal or monoclonal antibodies that bind to a particular cold spot polypeptide or antigenic fragment thereof may be produced using well known immunization techniques, which may also be performed as a service by contract research organizations, such as ProMab (Richmond, Calif.) or GenScript (Piscataway, N.J.).

Compositions Comprising Cold Spot Polypeptides and Antigenic Fragments Thereof

The polypeptides encoded by the recombinational cold spot genes described above (“cold spot polypeptides”) are useful in immunogenic compositions that are capable of eliciting an immune response to the cold spot polypeptide or an antigenic fragment thereof (such as the isolated cold spot extracellular domain). An immunogenic composition of the invention comprises at least one cold spot polypeptide or an immunogenic fragment thereof. Such compositions may also advantageously include a pharmaceutically acceptable carrier (vehicle) or diluent and/or an adjuvant to enhance elicitation of an immune response. Immunogenic compositions of the invention may be used to raise polyclonal or monoclonal antibodies capable of specific binding to the cold spot polypeptide, or an immunogenic fragment thereof, using standard methods and compositions known in the art.

For purposes of vaccine development, cold spot polypeptides comprising the native amino acid sequence of an in-common S. pneumoniae surface antigen, in particular the extracellular domain of the surface antigen, or an immunogenic fragment thereof, are preferred. Preferably, the predominant cold spot amino acid sequence among known alleles will be used, but other allelic sequences may be used and are expected to be effective, due to the extremely high sequence similarity among alleles, exceeding 95%, and usually exceeding 99%, average amino acid sequence pairwise homology, among the 231 known S. pneumoniae genomes. Although less preferred, a minor degree of artificial mutation from the predominant native amino acid sequence can be tolerated without destroying the desired immunogenicity that produces antibodies recognizing the native antigenic polypeptide and natural pneumococcal cells. Such artificial mutations may be introduced for other reasons, such as increasing recombinant expression, avoiding intracellular protease digestion, improving solubility, aiding purification, and the like; and, accordingly, amino acid variants of the aforementioned cold spot polypeptides that maintain at least 90%, preferably 95%, and most preferably 98% or more sequence identity with the predominant native amino acid sequence for any given cold spot polypeptide may be used, where the variant polypeptide maintains immunogenicity, i.e., the ability to elicit production of antibodies reactive with the naturally occurring cold spot polypeptide.

An immunogenic composition comprising a cold spot polypeptide, or immunogenic fragment thereof, may be used in any of a variety of methods known in the art for raising polyclonal or monoclonal antibodies to the polypeptide or antigenic fragment thereof. Such methods will typically require administration to a non-human animal (for example, mouse, rat, rabbit, guinea pig, hamster, chicken) of a composition comprising a cold spot polypeptide, or antigenic fragment thereof, that is dissolved, suspended, or otherwise contained within a physiologically acceptable vehicle and may further comprise one or more additional physiologically acceptable components including, but not limited to, an adjuvant and/or an excipient.

Preferred immunogenic compositions and vaccines according to the invention comprise a cold spot polypeptide, or an immunogenic fragment thereof, as described herein, and may further comprise one or more pharmaceutically acceptable components, such as a pharmaceutically acceptable vehicle, adjuvant, excipient, or other ingredient. Immunogenic compositions and vaccines described herein are understood to be dispersed in sterile vehicles and used under sterile conditions to avoid introduction of infectious agents and/or undesirable or interfering substances into the human or non-human subject.

Pharmaceutical vehicles or pharmaceutical carriers useful in compositions described herein for eliciting an immune response or producing antibodies to a cold spot polypeptide, or immunogenic fragment thereof, are well known to practitioners in the art and include, but are not limited to, sterile water; pyrogen-free preparations of sterile water; saline; phosphate buffered saline (PBS); dextrose; glycerol; ethanol; and combinations thereof. In some compositions, it may be preferable to include physiological or pharmaceutically acceptable isotonic agents, for example, but not limited to, sugars; polyalcohols, such as mannitol or sorbitol; sodium chloride; and combinations thereof. Pharmaceutically acceptable carriers may further comprise minor amounts of auxiliary substances such as wetting or emulsifying agents, preservatives, or buffers to enhance the shelf life or effectiveness of an immunogenic composition or vaccine.

Excipients may also be used in pharmaceutical compositions according to the invention. An excipient is generally any compound or combination of compounds that provides a desired feature to a composition. The pH may be adjusted in a composition as necessary, for example, to promote or maintain solubility of component ingredients, to maintain stability of one or more ingredients in a composition, and/or to deter undesired growth of microorganisms that inadvertently may be introduced at some point during the preparation or use of a composition or vaccine.

Particularly with respect to raising polyclonal antibodies in non-human animals, a polypeptide may be conjugated to or otherwise associated with a carrier protein or matrix that enhances the immune response to the polypeptide. The term “carrier protein” in the context of an immunogenic composition refers to a protein that elicits an immune response to itself and/or to an antigen conjugated or otherwise associated with or complexed with such carrier protein. In a conjugate immunogenic composition, an antigen is reacted with a carrier protein, so that the antigen and carrier protein are covalently linked to each other by design. Preferably, a carrier protein contains epitopes recognized by a helper T-cell to stimulate B cells that produce antibodies to the antigen that is conjugated to the carrier protein. Carrier proteins that are useful in raising antibodies to a polypeptide that is conjugated or otherwise associated with such carrier proteins include, but are not limited to, keyhole limpet hemocyanin (KLH), blue carrier protein, ovalbumin, bovine serum albumin, and derivatives thereof (for example, pegylated forms). Also encompassed by the definition of a “carrier protein” are multi-antigenic peptides (MAPs), which are branched peptides having a plurality of reactive sites to which antigenic molecules may be linked Preferably, a MAP includes lysine (Lys) residues that can link to antigenic molecules. Exemplary carrier proteins may also include, but are not limited to, bacterial toxins and toxoids, which may be mutated or chemically treated, for example, to reduce undesirable reactogenicity with tissues of an individual. Such carrier proteins include, for example, diphtheria toxin or a non-toxic mutant thereof, for example, diphtheria toxoid; tetanus toxin, or a non-toxic mutant thereof, for example, tetanus toxoid; Pseudomonas aeruginosa exotoxin A, or a non-toxic mutant thereof; cholera toxin B subunit; tetanus toxin fragment C; bacterial flagellin; pneumolysin; listeriolysin O (LLO, and related molecules); an outer membrane protein of Neisseria meningitidis; Pseudomonas aeruginosa Hcp1 protein; Escherichia coli heat labile enterotoxin; Shiga-like toxin; human LTB protein; the dominant negative inhibitor mutant (DNI) of the Protective Antigen of Bacillus anthracis; and Escherichia coli beta-galactosidase.

A polypeptide can be conjugated to a carrier protein using standard protocols and cross-linking (or “coupling”) agents known in the art. Some cross-linking agents are “homobifunctional” coupling agents that have two identical reactive sites to link a polypeptide to a carrier protein. Some cross-linking agents are “heterobifunctional” coupling agents that have two different reactive sites to link a polypeptide (at one site) with a carrier protein (at the other different site). Examples of cross-linking agents that may be used for conjugating a polypeptide to a carrier protein for raising antibody to the polypeptide include, but are not limited to, any one of glutaraldehyde, maleimide agents (for example, m-maleimidobenzoyl-N-hydroxysuccinimide ester), carbodiimide agents, and bis-diazotized benzidine. Those skilled in the art will be able to select appropriate cross-linking agents and cross-linking chemistries to effect conjugation of desired antigens with a chosen carrier protein. For example, glutaraldehyde is a suitable homobifunctional cross-linking agent for creating cross-links between the NH₂-functional side-chains of lysine residues in the polypeptide antigen and a carrier protein.

Notwithstanding the foregoing discussion, the immunogenic cold spot polypeptides of the present invention will typically not be conjugated or cross-linked to a carrier protein, as this would not be necessary to render them immunogenic. In fact, it is contemplated that the cold spot polypeptides disclosed herein may themselves be useful as a carrier protein for conjugation with weakly immunogenic antigens, such as S. pneumoniae capsular polysaccharides. Additionally, the cold spot polypeptides of the present invention may be admixed with conventional conjugate vaccines (e.g., PREVNAR®-7, PREVNAR®-13) or capsular polysaccharide antigen compositions (e.g., PNEUMOVAX®-23) to produce a compound vaccine composition. Such compound compositions would be expected to have extended range—vaccination coverage extending beyond the conjugate vaccine or capsular polysaccharide vaccine components used in the admixture.

Methods for the preparation and formulation of vaccine compositions are well known to those skilled in the art. The choice of ingredients will for instance vary depending on the intended administration route of the composition. For example, vaccine compositions for parenteral administration may include pharmaceutically acceptable sterile aqueous or non-aqueous solvents, suspensions, and emulsions. Examples of non-aqueous solvents include, but are not limited to, propylene glycol, polyethylene glycol, vegetable oils (such as olive oil), and injectable organic esters such as ethyl oleate. Topical carriers or occlusive dressings can be used to increase skin permeability and enhance antigen absorption for topically applied vaccine compositions. Liquid dosage forms for oral administration may generally comprise, for example, a liposome composition containing the liquid dosage form. Suitable forms for suspending liposomes include emulsions, suspensions, solutions, syrups, and elixirs containing inert diluents commonly used in the art, such as purified sterile water.

An immunogenic composition or vaccine comprising an S. pneumoniae cold spot polypeptide, or immunogenic fragment thereof, of the invention may also comprise any of a variety of known adjuvants that enhance an immune response, especially production of antibodies, to an antigen. Examples of adjuvants that may be used in an immunogenic or vaccine composition comprising an S. pneumoniae cold spot polypeptide, or an immunogenic fragment thereof, described herein include, but are not limited to:

(1) Freund's complete adjuvant (FCA) or Freund's incomplete adjuvant (HA) may be employed to boost an immune response to a polypeptide in a non-human animal in order enhance the level (titer) of antibody produced to the polypeptide. FCA comprises a water in mineral oil emulsion and inactivated mycobacteria. HA comprises a water in mineral oil emulsion without the mycobacteria component of FCA. FCA and HA are particularly useful when antibody is being produced in a non-human animal for research purposes. Owing to various side effects, including severe local reactions at sites of inoculation, FCA is not approved for use in humans and, therefore, will not comprise a component of an immunogenic composition or vaccine intended for administration to human subjects. FIA has been used in some vaccines administered to humans, but is less preferred in view of undesirable side effects compared with other approved alternatives, such as alum salts and monophosphoryl lipid A discussed below.

(2) Aluminum salts, such as alum salts, are particularly useful as adjuvant to stimulate an immune response to a polypeptide in vaccine compositions administered to humans. The term “alum” is commonly used to refer to a compound that is a combination of aluminum and potassium sulfate salt (such as KAl(SO₄)₂.12H₂O), or to an aluminum hydroxide. Although relatively weak when compared to other adjuvants, alum salts are the most commonly employed adjuvants used in vaccine compositions that are manufactured for administration to human subjects owing to the low incidence, if any, of adverse side effects. The term “alum” may sometimes also be used to refer to a broader group of aluminum and ammonia sulfate salts. Other aluminum salts that may be used to stimulate an immune response to a cold spot polypeptide, or immunogenic fragment thereof, present in an immunogenic composition or vaccine include, without limitation, aluminum hydroxide, aluminum phosphate, aluminum sulfate, hydrated alumina, alumina hydrate, alumina trihydrate (ATH), aluminum hydrate, aluminum trihydrate, alhydrogel, Superfos, Amphogel, aluminum (III) hydroxide, aluminum hydroxyphosphate sulfate, amorphous alumina, trihydrated alumina, and trihydroxyaluminum.

(3) Monophosphoryl lipid A, which, like alum, is an adjuvant that has been approved and used in vaccines manufactured for administration to human subjects.

(4) Oil-in-water emulsion formulations (with or without other specific immunostimulating agents such as muramyl peptides or bacterial cell wall components), such as, for example, (a) MF59 (International Publication No. WO 90/14837), containing 5% squalene, 0.5% Tween 80, and 0.5% Span 85 (optionally containing various amounts of MTP-PE (see below, although not required)) formulated into submicron particles using a microfluidizer such as Model 11 OY microfluidizer (Microfluidics, Newton, Mass.); (b) SAF, containing 10% squalene, 0.4% Tween 80, 5% pluronic-blocked polymer L121, and thr-MDP (see below) either microfluidized into a submicron emulsion or vortexed to generate a larger particle size emulsion; and (c) RIBI™ adjuvant system (RAS), (Corixa, Hamilton, Mont.) containing 2% squalene, 0.2% Tween 80, and one or more bacterial cell wall components from the group consisting of 3-0-deaylated monophosphoryl lipid A (MPL™) described in U.S. Pat. No. 4,912,094 (Corixa), trehalose dimycolate (TDM), and cell wall skeleton (CWS), preferably MPL+CWS (Detox™).

(5) Muramyl peptides used as adjuvants may include, but are not limited to, N-acetyl-muramyl-L-threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanine-2-(1′-2′dipalmitoyl-sn-glycero-3-hydroxyphosphoryloxy)-ethylamine (MTP-PE).

(6) Saponin adjuvants, such as Quil A or STIMULON™ QS-21 (Antigenics, Framingham, Mass., USA) (U.S. Pat. No. 5,057,540) may be used, or particles generated therefrom such as ISCOMs (immunostimulating complexes).

(7) Bacterial lipopolysaccharide (LPS) or synthetic lipid A analogs, such as aminoalkyl glucosamine phosphate compounds (AGP), or derivatives or analogs thereof (available from Corixa), as described in U.S. Pat. No. 6,113,918. One such AGP is 2-[(R)-3-tetradecanoyloxytetradecanoylamino]ethyl 2-deoxy-4-0-phosphono-3-0-[(R)-3-tetradecanoyloxytetradecanoyl]-2-[(R)-3-tetradecanoyloxytetradecanoylamino]-b-D-glucopyranoside, which is also known as 529 (formerly known as RC529), which is formulated as an aqueous form or as a stable emulsion.

(8) Synthetic polynucleotides, such as oligonucleotides containing CpG motif(s) as described, for example, in U.S. Pat. No. 6,207,646 may also be used in vaccine compositions of the invention.

(9) Cytokines, including, but not limited to, interleukins (for example, any one of IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, IL-15, IL-18, etc.), interferons (for example, gamma interferon), granulocyte macrophage colony stimulating factor (GM-CSP), macrophage colony stimulating factor (M-CSF), tumor necrosis factor (TNF), costimulatory molecules B7-1 and B7-2, may be used as adjuvants in immunogenic compositions and vaccines of the invention.

(10) Detoxified mutants of a bacterial ADP-ribosylating toxin such as a cholera toxin (CT) either in a wild-type or mutant form, for example, where the glutamic acid at amino acid position 29 is replaced by another amino acid, preferably a histidine, in accordance with International Publication No. WO 00/18434. See also WO 02/098368 and WO 02/098369), a pertussis toxin (PT), or an E. coli heat-labile toxin (LT), particularly LT-K63, LT-R72, CT-S109, PT-K9/G129 (see, e.g., WO 93/13202 and WO 92/19265).

Additional descriptions and details of the above adjuvants and others may be found in various pharmaceutical science and compositions handbooks and reviews available in the art. See, for example, in Petrovsky et al., Immunol. Cell Biol., 82: 488-496 (2004), incorporated herein by reference.

It is understood that any immunogenic composition or vaccine comprising a cold spot polypeptide, or immunogenic fragment thereof, of the invention will require regulatory approval for non-experimental use of the composition in human subjects. Accordingly, an adjuvant that is not commonly employed in immunogenic compositions and vaccines, may nevertheless be approved when present as a component of a particular formulation that is granted regulatory approval for use in humans. A regulatory approval process typically assesses the benefit of a vaccine with respect to pneumococcal infection and disease against any risk or severity of untoward side effects.

Thus, any of a variety of pharmaceutically acceptable ingredients may be used in preparing an immunogenic composition comprising one or more S. pneumonia cold spot surface antigens, or immunogenic polypeptide fragments thereof, according to the invention. Preferably, such immunogenic compositions are prepared that are used to raise an immune response in a human that produces antibodies reactive with a majority of S. pneumonia capsular serotypes, more preferably with at least 80 different S. pneumonia serotypes, even more preferably with at least 90 different S. pneumonia serotypes, and still more preferably, with all or nearly all pneumococcal strains (currently, ˜94 known serotypes).

The invention also provides methods of making immunogenic compositions described above. For example, a method of making an immunogenic composition for raising an immune response producing antibodies reactive against at least 80 different serotypes of S. pneumoniae comprises:

(1) selecting one or more cold spot surface antigens of S. pneumoniae,

(2) isolating one or more polypeptide segments from an extracellular domain of the one or more cold spot surface antigens selected in step (1), and

(3) formulating said one or more isolated polypeptide segments obtained in step (2) by admixing said isolated polypeptide with a pharmaceutically acceptable carrier, to produce an immunogenic composition for inoculating human subjects against S. pneumonia infection.

In an embodiment of the above method of making an immunogenic composition according to the invention, 1, 2, 3, 4, or 5 cold spot surface antigens are selected in step (1). In another embodiment of the above method, 1 or 2 cold spot surface antigens are selected. In a further embodiment of the method described above, 1 cold spot surface antigen is selected in step (1).

Preferably, in the above method of making an immunogenic composition according to the invention, the composition, when administered to a mammalian subject, is effective for raising an immune response producing antibodies reactive with at least 90 S. pneumonia serotypes, more preferably with at least 91 S. pneumonia serotypes, and even more preferably with at least 93 S. pneumonia serotypes.

In addition, in the above method of making an immunogenic composition according to the invention, the immunogenic composition may also include one or more adjuvants.

Uses of Cold Spot Polypeptide Compositions

As noted above, an immunogenic composition comprising an S. pneumoniae cold spot polypeptide, or immunogenic fragment thereof, described herein may be used to raise polyclonal or monoclonal antibodies for research or for passive immunity (immunotherapy).

For example, a polyclonal or monoclonal antibody raised against a cold spot polypeptide can be used to identify fragments of the cold spot polypeptide that are bound by the antibody (“antigenic” fragments). In a general, but non-limiting, example, standard deletion mutation protocols are available in the art and/or from commercial sources (for example, polymerase chain reaction (PCR) or nuclease deletion protocol) that can be used to generate a nested set of deletion mutated (truncated) sequences of a cold spot gene. Each of the corresponding encoded deletion mutant (truncated) polypeptides may then be expressed using a standard recombinant expression system (for example, but not limited to, a bacterial or yeast expression system) or by a cell-free in vitro transcription and translation system, which are readily available in the art and/or from commercial sources. Some deletion protocols also provide internal deletions of sequences for additional or closer analyses of encoded mutated polypeptides. Each of the expressed mutant polypeptides can then be assayed to determine whether or not it is bound by a polyclonal or monoclonal antibody raised to the cold spot protein using any of a variety of immuno-detection assays (for example, enzyme-linked immunosorbent assay (ELISA), immuno-dot blot, Western blot), which provide relatively rapid, multi-sample determination of antibody binding. Such mutant analysis to identify antigenic fragments of a cold spot polypeptide can also be carried out under contract with a commercial laboratory.

Other immunogenic compositions may be designed and used to determine efficacy as a potential vaccine composition to produce antibody that provides protective immunity against S. pneumoniae infection in a human or non-human animal subject or an animal model of pneumococcal infection. A “protective” immunity or “protective” immune response, when used in the context of a S. pneumoniae cold spot polypeptide or immunogenic fragment thereof, a composition comprising a S. pneumoniae cold spot polypeptide or immunogenic fragment thereof, or a method comprising administration of a S. pneumoniae cold spot polypeptide or immunogenic fragment thereof indicates that inoculation with such polypeptide, such composition, or according to such method results in a detectable immune response of sufficient magnitude (antibody titer) to provide a level of protection against S. pneumoniae infection in the inoculated human or non-human animal subject. Such protective immunity includes therapeutic and/or prophylactic effects that reduce the likelihood of S. pneumoniae infection or the likelihood of contracting one or more disorders resulting from such infection, as well as reducing the severity of the infection and/or a disorder resulting from such infection. As such, a protective immune response includes, for example, the ability to reduce bacterial load, ameliorate one or more disorders or symptoms associated with S. pneumoniae infection, and/or to delay the onset of disease progression resulting from S. pneumoniae infection. Diseases resulting from S. pneumoniae infection of particular interest with respect to this invention are pneumonia (lung infection), otitis media (middle ear infection), sinusitis (sinus infection), sepsis (bacteremia, bloodstream infection), and meningitis (infection of the meninges, which comprise three membranes covering the brain and spinal cord).

The level of protection can be assessed initially using animal models and subsequently phased clinical trials. The nature of an immune response may be measured, for example, by flow cytometry, development of antibodies, ELISA, opsonophagocytosis (OPA), or by measuring resistance to pathogen challenge in vivo. In a non-limiting example, to determine whether a protective immune response is induced by compositions of the present invention, immunized test animals can be challenged with Streptococcus pneumoniae and the growth and spread of the bacteria in the animals, survival of challenged animals, or change in health of the challenged animals may be measured over time. Such a study may also monitor progression of symptoms of any disease caused by pneumococcal infection in one or more tissues or organs.

As explained above, cold spot polypeptides encoded by S. pneumoniae cold spot genes are highly conserved across most if not all known capsular serotypes of S. pneumoniae. Therefore, a pharmaceutically acceptable immunogenic composition comprising a cold spot polypeptide, or an immunogenic fragment thereof, as described herein may be used as a vaccine that is administered to a human subject to elicit an immune response to the cold spot polypeptide or immunogenic fragment thereof, and thereby to any or all S. pneumoniae capsular serotypes that express the same highly conserved cold spot polypeptide or immunogenic fragment. Such a vaccine is particularly desirable because currently marketed vaccines are designed to elicit an immune response to only a limited number of the 94 or so known pneumococcal polysaccharide capsular serotypes. As just one example, the vaccine currently marketed as PREVNAR-13® (Wyeth LLC, marketed by Pfizer, New York, N.Y.) comprises 13 polysaccharide capsular antigens from 13 selected pneumococcal capsular serotypes and thus is formulated to elicit an immune response to infection by any of the 13 selected S. pneumoniae serotypes but none of the remaining 81 known S. pneumoniae capsular serotypes. In contrast, each cold spot polypeptide described herein is highly conserved and expressed across all or nearly all known S. pneumoniae capsular serotypes, and therefore an immune response elicited to a cold spot polypeptide, or immunogenic fragment thereof, provides an immune response (such as antibodies) directed to most if not all known serotypes of S. pneumoniae. Moreover, the emergence of a breakthrough escape mutant is significantly lower owing to the recombinantly quiescent (cold spot) origin of the gene encoding the cold spot polypeptide. It is expected that a vaccine composition comprising as few as two cold spot polypeptides or immunogenic fragments thereof would virtually eliminate the possibility of a breakthrough strain of S. pneumoniae emerging.

Accordingly, in a preferred embodiment, an immunogenic composition of the invention useful as a vaccine against pneumococcal infection comprises at least one S. pneumoniae cold spot polypeptide or immunogenic fragment thereof, as described herein; a pharmaceutically acceptable vehicle (such as PBS); and, as optionally desired, a pharmaceutically acceptable adjuvant (for example, an alum salt or another adjuvant described above). Based on the property of a vaccine composition of the invention comprising at least one S. pneumoniae cold spot polypeptide or immunogenic fragment thereof to elicit an immune response that produces antibodies to most if not all pneumococcal serotypes, a vaccine composition of the invention may also be referred to as a “capsular serotype-independent vaccine” or, alternatively, as a “universal S. pneumoniae vaccine”.

Detection and Characterization of Antibodies Raised Against Cold Spot Polypeptides

Antibodies raised to a S. pneumoniae cold spot polypeptide, or an immunogenic fragment thereof, may be detected using any of a variety of assays that are known in the art and/or that are commercially available. Such assays present a cold spot polypeptide or an antigenic fragment thereof that can be brought into contact with a sample that contains or is suspected of containing antibody raised to the cold spot polypeptide or an immunogenic fragment thereof. Such assay formats include, but are not limited to, immunoprecipitation, enzyme linked immunosorbent assay (ELISA), opsonophagocytosis (OPA), immuno-dot blot assay, Western (immuno) blotting, immuno-affinity chromatography, flow cytometry (FCM), and fluorescence-activated cell sorting (FACS). In such assays, antibody bound to the cold spot polypeptide or antigenic fragment thereof, forming an antibody-antigen complex, can be detected by any of a variety of detection systems available in the art. Such detection systems may include the use of a secondary antibody that will bind to the antibody raised against the cold spot polypeptide or fragment thereof. Such a secondary antibody may be conjugated or otherwise linked to a detectable label, such as, but not limited to, a radiolabel, a fluorescent label, biotin, or an enzyme that can generate a detectable signal when provided with its corresponding substrate. Examples of enzymes that may be conjugated to a secondary antibody in such detection systems include, but are not limited to, horseradish peroxidase, alkaline phosphatase, and luciferase. Other standard assays for detecting antibody or determining the titer of antibody in a sample are known to the skilled practitioner in the art.

Antibodies can be assessed for binding affinity and potency using various analytical ELISA formats or by detecting binding in real time using surface plasmon resonance (SPR) techniques using, for example, a BIACORE™ SPR detection instrument (GE Healthcare Bio-Sciences, Pittsburgh, Pa., USA). Such affinity assays may be especially useful in characterizing antibody used for research or passive immunity studies.

Additional embodiments and features of the invention will be apparent from the following non-limiting examples.

EXAMPLES Streptococcus Pneumoniae Highly Conserved “Cold Spot” Proteins Useful in Pneumococcal Vaccines

By analysis of 50-kb regions upstream and downstream of the four rrn operons of all available S. pneumoniae genomes on the NCBI website as outlined supra, recombinationally quiescent or “cold spot” genes were identified that encode polypeptides (cold spot polypeptides) of extremely high retention from strain to strain and extremely low variability in sequence. Among such cold spots in the S. pneumoniae genome, it was hypothesized that proteins would be encoded that would be valuable as vaccine antigens. Using sub-cellular protein localization algorithms (Gpos-mPloc, described by Shen et al., Protein Pept. Lett., 16(12): 1478-1484 (2009); PSORTdb described by Yu et al., Nucleic Acids Res., 39 (database issue): D241-D244 (2011); and LocateP described by Zhou et al., BMC Bioinfonnatics, 9(173): 17 pages. (2008)), 104 cold spot genes, encoding putative expressed proteins, were identified from the rrn flanking regions. At the time of the analysis, 231 fully sequenced S. pneumoniae genomes were available for comparison. Cold spot genes that were in-common (present) in all strains and showed very low sequence variability among all 231 sequenced genomes reported in the NCBI pneumococcal database were of particular interest, as these two characteristics gave an initial indication that such phenotypes might be essential for survival of the cell, since S. pneumoniae genomes deleting or significantly mutating the protein evidently did not survive to propagate.

A search of the scientific literature was conducted on each of the 104 genes and their predicted expression products, and on cognate proteins of other organisms, to determine likely biological properties and functions of the encoded surface proteins. From the literature survey, ˜79 genes were selected as likely encoding surface proteins and ranked in order of most likely to be essential to cell survival. The top 21 surface proteins from the 79 identified surface protein genes located in the flanking regions of the S. pneumoniae rrn operons were further analyzed for universal presence in the 231 known, fully sequenced S. pneumoniae genomes and to determine the exact degree of sequence invariability across the population of different S. pneumoniae strains. Sequence information and amino acid sequence homology for the 21 cold spot surface proteins and their allelic variants is shown in Tables 1A and 1B, above. Identification of 58 additional surface proteins from the cold spot flanking regions adjacent the rrn operons of S. pneumoniae is made in Table 2, below.

The 58 genes in the table below are listed by consecutive map positions, located immediately up-stream and down-stream of the 4 rrn operons (rrnA, rrnB, rrnC, rrnD) of the genomically sequenced S. pneumoniae isolate TIGR4. The data in Table 2 were acquired by examination of at least 123 pneumococcal isolates. The data presented in Tables 1A and 1B (supra) are thus more complete, representing comparison of all 231 fully sequenced pneumococcal genomes (as well as additional sequences from incomplete genomes) available at the time. In Table 2, below, in addition to the mapped position for the 58 additional surface proteins identified, the mapped position in the TIGR4 S. pneumoniae genome of the four rrn operons is also given, to show the location of the cold spot genes in relation to the positions of the rrn operons.

TABLE 2 Additional Surface Protein-Encoding Genes from rrn Operon-Flanking Regions of S. pneumoniae mapped position GenBank in Designation in Gene TIGR4 genome, TIGR4 Encoded Protein and Presence including stop protein Genomic Function across ≥123 % Sequence codon length Sequence (if known) genomes Homology (SEQ ID NO:) (aa) SP_2207 ComF, putative 100% >99% 2127745- 220 competence protein; 2128407 KEGG pathway Type (SEQ ID II secretion system NO: 274) SP_2216 PcsB, putative murein  98% >99% 2135267- 392 hydrolase or protein 2136445 required for cell wall (SEQ ID separation NO: 275) SP_2223 Hypothetical protein; 100% >99% 2140901- 276 Function unknown. 2141731 (SEQ ID NO: 276) SP_2226 Hypothetical protein; 100% >99% 2144453- 122 Function unknown 2144821 (SEQ ID NO: 277) SP_0005 PTH, peptidyl-tRNA 100% >99% 4382- 189 hydrolase 4951 (SEQ ID NO: 278) SP_0007 SS4Dom 100% >99% 8519- 88 8785 (SEQ ID NO: 279) SP_0008 Hypothetical protein, 100% >99% 8778- 122 PY-triad 9146 (SEQ ID NO: 280) SP_0010 Hypothetical protein, 100% >99% 9266- 422 PY-triad 10534 (SEQ ID NO: 281) rrnA operon 5S, 23S and 16S rrn 100% 15344- 5S-23S-16S structural RNA 20236 genes molecules (SEQ ID NO: 282) SP_0016 IS630-Spn1, 100% >99% 20929- 112 transposase Orf2 21267 Transposase and (SEQ ID inactivated derivatives NO: 283) SP_0020 cytidine/deoxycytidylate 100% >99% 24319- 155 deaminase family 24786 protein (SEQ ID NO: 284) SP_0021 Dut, deoxyuridine 5′- 100% >99% 24973- 147 triphosphate 25416 nucleotidohydrolase (SEQ ID NO: 285) SP_0024 conserved hypothetical 100% >99% 27381- 165 protein, Carbonic 27878 anhydrase (SEQ ID NO: 286) SP_0030 ccs16, competence- 100% >99% 30563- 108 induced protein Ccs16; 30889 (SEQ ID NO: 287) SP_0037 plsX, fatty  98% >99% 38101- 330 acid/phospholipid 39093 synthesis protein; (SEQ ID Lipid transport and NO: 288) metabolism SP_0041 blpU, bacteriocin, 100% >97% 39871- 76 PY-triad 40101 (SEQ ID NO: 289) SP_0046 purF, amidophospho- 100% >97% 49228- 480 ribosyl transferase; 50670 Nucleotide transport (SEQ ID and metabolism. NO: 290) SP_0048 purN, phosphoribosyl  98% >99% 51908- 181 glycinamide 52453 formyltransferase; (SEQ ID Nucleotide transport NO: 291) and metabolism: SP_0050 purH, bifunctional 100% >99% 53071- 515 phosphoribosyl 54618 aminoimidazole- (SEQ ID carboxamide formyl NO: 292) transferase/IMP cyclohydrolase; Purine metabolism SP_0053 purE, phosphoribosyl 100% >99% 56405- 162 aminoimidazole 56893 carboxylase catalytic (SEQ ID subunit; Catalyzes a NO: 293) step in the de novo purine nucleotide biosynthetic pathway SP_0057 PY-triad, Beta-N-  98% >98% 59624- 1,312 acetylhexosaminidase; 63562 identified by match to (SEQ ID PFAM protein family NO: 294) HMM PF00746 SP_1732 StkP, serine/threonine 100% >98% 1634699- 659 protein kinase; 1636678 (SEQ ID NO: 295) SP_1793 hypothetical protein;  63% >95% 1708735- 191 spoU rRNA methylase 1709310 family (SEQ ID NO: 296) SP_1847 xpt, xanthine 100% >98% 1755277- 193 phosphoribosyl 1755858 transferase; (SEQ ID Adenine/guanine NO: 297) phosphoribosyl transferases and related PRPP-binding proteins; Nucleotide transport and metabolism SP_1850 dpnC, type II 100% >99% 1757628- 254 restriction 1758392 endonuclease DpnI; (SEQ ID Recognizes the NO: 298) double-stranded and methylated sequence G(Me)ATC and cleaves after A-2; Endonucleolytic cleavage of DNA SP_1855 alcohol 100% >99% 1762944- 345 dehydrogenase, zinc- 1763981 containing; threonine (SEQ ID dehydrogenase and NO: 299) related Zn-dependent dehydrogenases; Amino acid transport and metabolism SP_1865 pepA, glutamyl- 100% >99% 1771923- 354 aminopeptidase; 1772987 cellulase M and (SEQ ID related proteins; NO: 300) Acting on peptide bonds; Carbohydrate transport and metabolism SP_1897 sugar ABC transporter  98% >99% 1803124- 419 substrate-binding 1804383 protein, PY-triad (SEQ ID NO: 301) rrnD operon 5S, 23S and 16S rrn 100% 1810171- 5S-23S-16S structural RNA 1815064 genes molecules (SEQ ID NO: 302) SP_1911 Thioredoxin, putative; 100% >96% 1824358- 105 similar to SP: P29449 1824675 PID: 20047; identified (SEQ ID by sequence similarity NO: 303) SP_1912 Hypothetical protein 100% >96% 1824672- 99 1824971 (SEQ ID NO: 304) SP_1923 ply, Pneumolysin; 100% >99% 1831896- 471 identified by match to 1833311 PFAM protein family (SEQ ID HMM PF01289; NO: 305) SP_1937 lytA, Autolysin; 100% >98% 1840405- 318 identified by match to 1841361 PFAM protein family (SEQ ID HMM PF015100 NO: 306) acetylmuramoyl-L- alanine amidase, phage origin SP_1947 Hypothetical protein  91% >92% 1850094- 56 1850264 (SEQ ID NO: 307) SP_1948 Hypothetical protein; 100% >99% 1850367- 74 similar to PID: 559859; 1850591 identified by sequence (SEQ ID similarity NO: 308) SP_1949 Hypothetical protein 100% >99% 1850598- 62 1850786 (SEQ ID NO: 309) SP_1954 serine protease  99% >99% 1857994- 467 subtilase family 1859397 protein (SEQ ID NO: 310) SP_1955 Hypothetical protein  95% >99% 1859412- 103 1859723 (SEQ ID NO: 311) SP_1980 cbf1, cmp-binding 100% >99% 1884257- 308 factor1; Predicted HD- 1885183 superfamily hydrolase (SEQ ID NO: 312) SP_1990 Primase-related 100% >99% 1893835- 186 protein; topoisomerase 1894395 primase (TOPRIM) (SEQ ID nucleotidyl NO: 313) transferase/hydrolase domain found in Ribonuclease M5 SP_1991 Hydrolase; DNase 100% >99% 1894395- 257 TatD; TatD like 1895168 proteins (SEQ ID NO: 314) SP_1994 alaT, 100% >99% 1896929- 404 Aminotransferase 1898143 AlaT (alanine (SEQ ID aminotransferase); NO: 315) Aspartate/tyrosine/ aromatic aminotransferase; Transferase activity, transferring nitrogenous groups SP_1997 Cof family protein;  98% >98% 1899242- 462 Hydrolyse activity 1900630 (catalyzing hydrolysis (SEQ ID of various bonds, e.g., NO: 316) C—O, C—N, C—C, phosphoric anhydride bonds, etc.) rrnC operon 5S, 23S and 16S rrn 1908828- 5S-23S-16S structural RNA 1913721 genes molecules (SEQ ID NO: 317) SP_2010 pbp2A, Penicillin- 100% >99% 1915717- 731 binding protein 2A 1917912 (penicillin-binding (SEQ ID protein, 1A family); NO: 318) Membrane carboxypeptidase; Cell wall/membrane/ envelope biogenesis; Biosynthesis and degradation of murein sacculus and peptidoglycan SP_2014 IS630-Spn1, Not Not 1921149- 112 transposase Orf2; Determined Determined 1921487 DNA replication, (SEQ ID recombination, and NO: 319) repair SP_2021 Glycosyl hydrolase, Not Not 1926449- 469 family 1; Beta- Determined Determined 1927858 glucosidase/6- (SEQ ID phospho-beta- NO: 320) glucosidase/beta- galactosidase; Carbohydrate transport and metabolism SP_2027 Hypothetical protein; Not Not 1933726- 136 COG4642, Determined Determined 1934136 Uncharacterized (SEQ ID protein conserved in NO: 321) bacteria (Function unknown) SP_2030 Tkt, Transketolase; Not Not 1935028- 658 carbohydrate transport Determined Determined 1937004 and metabolism; (SEQ ID catalyzes the NO: 322) formation of ribose 5- phosphate and xylulose 5-phosphate from sedoheptulose 7- phosphate and glyceraldehyde 3- phosphate SP_2042 rnpA, Ribonuclease Not Not 1947360- 123 P; protein component Determined Determined 1947731 of RNaseP which (SEQ ID catalyzes the removal NO: 323) of the 5′-leader sequence from pre- tRNA to produce the mature 5′terminus SP_2051 cglC, competence Not Not 1952343- 108 protein CglC; similar Determined Determined 1952669 to GP: 3211750; (SEQ ID identified by sequence NO: 324) similarity SP_2055 Alcohol Not Not 1955168- 352 dehydrogenase, zinc- Determined Determined 1956226 containing; amino acid (SEQ ID transport and NO: 325) metabolism SP_2056 nagA, N- Not Not 1956389- 383 acetylglucosamine-6- Determined Determined 1957540 phosphate deacetylase; (SEQ ID Hydrolase activity NO: 326) acting on carbon- nitrogen bonds other than peptide bonds SP_2057 Hypothetical protein; Not Not 1957693- 605 Predicted Determined Determined 1959510 acyltransferase (SEQ ID NO: 327) SP_2063 LysM domain- Not Not 1963634- 370 containing protein; Determined Determined 1964746 Lysin domain, found (SEQ ID in a variety of NO: 328) enzymes involved in bacterial cell wall degradation SP_2064 HAD superfamily Not Not 1964912- 206 hydrolase, haloacid Determined Determined 1965532 dehalogenase-like (SEQ ID family; identified by NO: 329) match to PFAM protein family HMM PF00702 SP_2066 thrC, Threonine Not Not 1966877- 494 synthase; catalyzes Determined Determined 1968361 formation of L- (SEQ ID threonine from O- NO: 330) phospho-L- homoserine; Amino acid transport and metabolism rrnB operon 5S, 23S and 16S rrn 1970844- 5S-23S-16S structural RNA 1975734 genes molecules (SEQ ID NO: 331) SP_2093 hypothetical protein Not Not 2001118- 322 Determined Determined 2002086 (SEQ ID NO: 332) SP_2097 2,3,4,5-tetrahydro- Not Not 2004797- 232 pyridine-2-carboxylate Determined Determined 2005495 N-succinyltransferase, (SEQ ID putative; Amino acid NO: 333) transport and metabolism SP_2105 Hypothetical protein Not Not 2014784- 190 Determined Determined 2015356 (SEQ ID NO: 334) SP_2106 malP, glycogen Not Not 2016399- 752 phosphorylase family Determined Determined 2018657 protein; Glucan (SEQ ID phosphorylase; NO: 335) Carbohydrate transport and metabolism

To determine that the cold spot genes selected were in-common to all S. pneumoniae strains, genomic preparations immobilized on Zeta-Probe nylon membranes (Bio-Rad Laboratories, Inc.) were probed with a radiolabeled probe derived from the coding sequence for each of the 21 selected cold spot genes. In this way, it was determined that each of the selected cold spot genes was universally present across the inventor's laboratory collection of 2,500 phylogenically organized S. pneumoniae isolates. This is significant, because all protein-based vaccine candidates offered thus far have proved to be non-universal across all strains or have proved to be highly variable in sequence when multiple strains are considered. This leads to failure of such vaccines to address a significant number of S. pneumoniae strains or to allow “breakthrough” of strains in which an antigen which a candidate vaccine targets avoids the immune response induced by the original vaccine. See, e.g., Navarro-Torné et al., Emerging Infectious Diseases, 21(3):417-425 (2015).

Next, amino acid sequences of the predicted expression products of the surface protein genes were analyzed for sequence homology. The reference population of genomic sequences for this work was the 231 fully sequenced S. pneumoniae genomes available in the NCBI GenBank database at the time. For some proteins, alleles from additional strains for which the entire genome had not been fully sequenced but which were nonetheless available from the GenBank database were also analyzed. Amino acid sequences for each of the 21 proteins in each of the 231+ full or partial genomes were aligned and the degree of variability in sequence was determined. For each protein, a predominant sequence appeared, in which the amino acid sequence was the same across a majority of the strains considered, followed by allelic variants of varying numbers. The allelic variants differed from the predominant sequence by from one amino acid switch to as many as 23 amino acid sequence variations along the full protein sequence, for the 21 proteins studied. The number of alleles varied across the 21 cold spot genes studied from 1 to 34.

To quantitate the degree of sequence variability, the percent average amino acid sequence pairwise homology was determined for each cold spot polypeptide, according to the formula:

$\frac{\left( {100\% \times n} \right) + \left( {x\mspace{14mu} \% \times n\; 1} \right) + \left( {x\mspace{14mu} \% \times n\; 2} \right) + {\ldots \mspace{14mu} \left( {x\mspace{14mu} \% \times {nI}} \right)}}{\left( {{total}\mspace{14mu} {number}\mspace{14mu} {of}\mspace{14mu} {sequences}\mspace{14mu} {compared}} \right)}$

where: x % is the percent sequence identity in comparison to the predominant aa sequence;

n is the number of strains having the most prevalent (predominant) amino acid sequence;

n1 is the number of allelic variants of the next most prevalent amino acid sequence, after n;

n2 is the number of allelic variants of the next most prevalent amino acid sequence, after n1;

nI is the number of allelic variants of the least prevalent allelic variant sequence, where I is the total number of allelic variants in the population of amino acid sequences considered.

Accordingly, by way of example, a S. pneumoniae surface protein of 450 amino acids, having 65 allelic variants, where each of the variants has two amino acid differences from the predominant sequence, and where 231 genomic sequences were compared, would have an average amino acid sequence pairwise homology of over 99.87%, calculated as: [(100%×166)+(448/450×100%×65)]÷231=[16600%+6471.11%]÷231=23071.11%÷231=99.875% average amino acid sequence pairwise homology.

It will be appreciated that the average amino acid sequence pairwise homology value will be affected by the total number of sequences compared, and therefore the question arises how many sequences must be compared before the pairwise homology number is accurately representative of the variability of the amino acid sequence throughout the universe of S. pneumoniae strains. Because the amino acid sequences encoded by cold spot genes are extremely invariant, the number of sequences that must be compared before a sequence homology value is obtained that is representative of the diversity of the entire population of S. pneumoniae strains is much lower than for a protein that shows more sequence variability across the population. To illustrate this, non-linear regression analysis was performed, plotting the cumulative number of allelic variants against the cumulative number of isolates considered for three cold spot polypeptides, i.e., GBSP23 (SP_2239), GBSP3 (SP_2218), and GBSP6 (SP_2217), in comparison with another well-known S. pneumoniae surface protein, PspA (SP_0117), which is much more variable across the population of S. pneumoniae strains. The comparative plot is shown in FIG. 1. The additional plot shown in FIG. 1, for SP_1872, was also for a cold spot polypeptide, but one having a lesser amino acid sequence pairwise homology (99.19%), and that protein was not extensively studied at this time. From this analysis, it can be seen that for the cold spot genes, the rate of diversification levels off well under 200 isolates, meaning that the results of sequence comparisons reported herein of 231 fully sequenced genomes (plus additional instances of individual genes from incomplete genomes) is representative of the diversity of the universe of S. pneumoniae strains. In contrast, referring in FIG. 1 to the curve for PspA, the rate of diversification has not leveled off on this plot at the limit of 300 compared isolates, meaning that many more isolates (than 300) would need to be compared before an accurate analysis of sequence diversity was obtained. This is important, since the number of known and fully sequenced genomes of S. pneumoniae has continued to grow, but calculations of relative constants, such as percentage amino acid sequence homology, will be expected not to vary significantly for relatively invariant cold spot genes, if at least about 150 or more separate strains are analyzed. Accordingly, in order to determine a percentage average amino acid pairwise sequence homology for a given coding sequence, the practitioner may compare 150 allelic sequences, or, e.g., 200, 400, 1000, 2000 or more allelic sequences, and the calculated sequence homology value will be expected to be essentially the same.

From the 21 cold spot surface polypeptides analyzed for essential function, sequence homology, and universal presence in the species, six polypeptides were further analyzed for suitability as vaccine candidates, as detailed below. Similar procedures can be applied to the other cold spot polypeptides determined to be surface-expressed in S. pneumoniae, to be in-common across all strains, and to have very high (99% or more) average amino acid sequence pairwise homology among isolates to confirm their utility as immunogens.

GenScript (Piscataway, N.J. (US)) was contracted to express and raise polyclonal antiserum against six selected cold spot surface polypeptides. Predominant coding sequence information for the proteins GBSP3 (SEQ ID NO:1), GBSP14 (SEQ ID NO:2), GBSP15 (SEQ ID NO:3), GBSP22 (SEQ ID NO:4), GBSP23 (SEQ ID NO:5), and GBSP24 (SEQ ID NO:6) set forth in Table 1A, above, was analyzed to predict such features as expressed amino acid sequence, antigenic index, hydrophilicity, solvent accessibility, flexible regions, coil/sheet/helical regions, and a breakdown of extracellular, transmembrane, and cytoplasmic domains along the length of the protein. For each of the six proposed cold spot polypeptides selected, a structural gene for expression of the largest extracellular segment was constructed, inserted into an expression vector, and expressed in E. coli. Purification of hexaHis- or octaHis-tagged polypeptides via Ni⁺⁺ affinity chromatography was performed prior to enzymatic elimination of the His tag. The portion of each polypeptide expressed and purified is presented in Table 3:

TABLE 3 Extracellular Portions of Cold Spot Surface Proteins for Immunogenicity Studies S. pneumoniae Surface Protein source of extracellular fragment Amino Acid Sequence of the expressed (SEQ ID for Portion of the S. pneumoniae protein fragment) Surface Protein Expressed GBSP3 RVVQKPFQWF DSVKSDLAHL TRTYNENESL KKQLYQLEVK (SEQ ID NO: 336) SNEVESLKTE NEQLRQLLDM KSKLQATKTL AADVIMRSPV SWKQELTLDA GRSKGASENM LAIANGGLIG SVSKVEENST IVNLLTNTEN ADKISVKIQH GSTTIYGIII GYDKENDVLK ISQLNSNSDI SAGDKVTTGG LGNFNVADIP VGEVVATTHS TDYLTREVTV KLSADTHNVD VIELVGNS GBSP14 LILEVTAVPV FSPTQSVEAV LVLLYDLTTI RTYEKLNLAF (SEQ ID NO: 337) VSNASHELRT PVTSIKGFAE TIKGMSAEEE ALKDDFLDII YKESLRLEHI VEHLLTLSKA QQMPIQWTTL SLAEFVQDLT QSLQPQLKKK DLQLKVQVPD DVTLVSDSQL LSQILLNLLS NAIRYTEQGG KIEVKTQKVN EGIKISVSDT GIGISQLEQD RIFERFYRVN KGRSRQTGGT GLGLAIVKEL SQLLGGQVTV TSQLGRGSCF TIFLPNQSFA QD GBSP15 QKNRQEEAKI LQKEEVLRVA KMALQTGQNQ VSINGVEIQV (SEQ ID NO: 338) FSSEKGLEVY HGSEQLLAIK EP GBSP22 SLASAVEALL APLKRVKVPV HEIGLMLSMS LRFVPTLMDD (SEQ ID NO: 339) TTRIMNAQKA RGVDFGEGSI VQKVKAMIPI LIPLFATSLK RADSLAIAME ARGYQGGKGR SQYRQLKWTL KD GBSP23 TQKSSVNNSN NNSTITQTAY KNENSTTQAV NKVKDAVVSV (SEQ ID NO: 340) ITYSANRQNS VFGNDDTDTD SQRISSEGSG VIYKKNDKEA YIVTNNHVIN GASKVDIRLS DGTKVPGEIV GADTFSDIAV VKISSEKVTT VAEFGDSSKL TVGETAIAIG SPLGSEYANT VTQGIVSSLN RNVSLKSEDG QAISTKAIQT DTAINPGNSG GPLINIQGQV IGITSSKIAT NGGTSVEGLG FAIPANDAIN IIEQLEKNGK VTRPALGIQM VNLSNVSTSD IRRLNIPSNV TSGVVVRSVQ SNMPANGHLE KYDVITKVDD KEIASSTDLQ SALYNHSIGD TIKITYYRNG KEETTSIKLN KSSGDLES GBSP24 NSHKVQMEKE IALKQKKFEQ KHLQNYTDEI VGLYNEIRGF (SEQ ID NO: 341) RHDYAGMLVS MQMAIDSGNL QEIDRIYNEV LVKANHKLRS DKYTYFDLNN IEDSALRSLV AQSIVYARNN GVEFTLEVKD TITKLPIELL DLVRIMSVLL NNAVEGSADS YKKQMEVAVI KMETETVIVI QNSCKMTMTP SGDLFALGFS TKGRNRGVGL NNVKELLDKY NNIILETEME GSTFRQIIRF KREFE

Six of the purified cold spot polypeptide candidate antigens were used to inoculate rabbits 2-3 times over 6-8 weeks for production of polyclonal antiserum. Rabbits were selected for immunization to elicit a greater volume of antiserum than would have been produced in, e.g., laboratory mice. About 3 mg of each polypeptide at >85% purity, and 10-50 mg of affinity-purified rabbit polyclonal antibodies (ELISA titer of 1:32,000), were provided for further experiments. For control purposes, pre-inoculation rabbit serum was also supplied. The rabbit antisera were first tested using a pneumococcal whole cell ELISA (WCE) assay. Briefly, S. pneumoniae cells were immobilized in the wells of microtiter plates, antiserum at 1:10 dilution from the cold spot polypeptide immunizations were separately added to wells and incubated with the cells. A radiolabeled reporter anti-rabbit IgG antibody was used for detection of bound antiserum. Control wells received pre-immune serum for contrast with the test wells.

All of the cold spot polypeptides showed elicitation of antiserum that recognized the native surface antigens of pneumococcal cells from ten different S. pneumoniae serotypes. The selected serotypes corresponded to ten of the thirteen serotypes addressed by the commercial PREVNAR-13® vaccine. Table 4 shows the fold increase range in colorimetric response of the polyclonal antiserum compared with the signal produced by pre-immune serum.

TABLE 4 Vaccine Candidate Antigenicity Results Based on Whole Cell ELISA (WCE) Range of Increase in Colorimetric Intensity Across 10 Pneumococcal Strain Cold Spot Surface SEQ ID Serotypes, from Comparison of Pre-immune Antigen Tested NO: vs. Post-immune Antiserum GBSP3 1 15-20 GBSP14 2 20-30 GBSP15 3 10-25 GBSP22 4 20-30 GBSP23 5  40-220 GBSP24 6 20-70

Whole cell ELISA results confirmed that the cold spot polypeptides are antigenic, presenting antibody-recognizable targets on the surface of pneumococcal cells. The fact that the tested polyclonal antiserum included anti-cold spot polypeptide IgG antibodies (bound by the reporter antibody) indicates that the surface antigens are immunogenic, eliciting a T cell-dependent antibody response.

Flow cytometry was also performed to evaluate surface antigenicity, using 1:100 dilution of antisera. The relevance of the six cold spot polypeptides was also confirmed by testing with pooled healthy human antiserum and rabbit antiserum from whole cell vaccinations.

Evaluation of Cold Spot Polypeptide Antisera for the Ability to Elicit Protection In Vivo

The antisera raised by the six selected cold spot polypeptides were next tested in a passive immunity model described in Briles et al., J. Infect. Dis., 182(6):1694-1701 (2000). Pre-immune and post-immune antiserum from the cold spot polypeptide immunizations was administered intraperitoneally to groups of CBA/N mice at 1:25, 1:100, 1:400, or 1:1600 dilutions one hour before intravenous challenge with a mouse-virulent strain of S. pneumoniae. Ringer's injection solution was used as a negative control; a known protective anti-PspA monoclonal antibody for the challenge strain was used as a positive control. Challenged mice were observed for changes in health and time to moribund. Time to moribund is determined by monitoring the mice every 6 hours post-challenge for signs of disease, hunched back, ruffled fur, irritability, lack of mobility, or failure to respond to touch. As soon as any of these symptoms are observed the surface temperature, determined with a scanning thermometer, is checked every 6 hours. When the surface temperature falls to 25° C., mice are considered moribund and are scored as such, euthanized with CO₂ narcosis and cervical dislocation. In some cases, heart blood, collected after death, was plated to verify bacteremia with pneumococci. The results indicated that three of the cold spot immunogens (GBSP14, GBSP15, and GBSP23) were at least partially protective, in that time to death was extended with GBSP15 and “statistically protective” with GBSP23. Mice receiving GBSP14 antiserum exhibited passive immunity protection equivalent to the PspA monoclonal antibody positive control.

Active immunization experiments are under way using 10 pg of antigen in a 10 pL volume of Pierce Alum Imject adjuvant, administered subcutaneously three times at 2-week intervals. It is expected that this immunization will result in protective immunity, demonstrated, e.g., by intravenous challenge with active S. pneumoniae bacteria.

Results from the battery of tests performed using purified antigen and rabbit antisera are shown in Table 5 below.

TABLE 5 Test Results for Different Vaccine Candidates Whole Passive Human Whole Candidate Cell Flow protection Pooled Cell Designation ELISA cytometry assay Serum IgG Vaccine GBSP3 ++ − − +++ +++ ++ GBSP14 ++ − +++ +++ +++ ++ GBSP15 ++ + + +++ +++ ++ GBSP22 − ++ − +++ +++ ++ GBSP23 − − +++ +++ +++ ++ GBSP24 − − − +++ +++ ++ “−” negative; “+” slightly positive; “++”positive; “+++”, strongly positive.

The results from the assay with healthy human serum shows that the general population has been exposed to S. pneumoniae and has generated a background anti-pneumococcal response even in the absence of pathogenic infection. Significantly, the assay reported here indicates that the six selected cold spot surface antigens were among the host of pneumococcal antigens recognized by the pooled serum.

All of the studied S. pneumoniae cold spot surface antigens were immunogenic in at least some of the plural tests. Differing results from test to test may be attributable to such factors as amount of immunogen utilized, purity of the inoculum, number of boosts, or factors pertaining to the target antigen itself such as differences in biological function, differential gene expression, or the effect of antibody complexation at the cell surface as part of the assay. For example, a very abundant protein on the cell surface might outperform a less abundant protein in the whole cell ELISA, e.g., by presenting many more antibody targets on the cell surface. In contrast, a cell surface antigen which, even in low abundance, led to cell death when complexed with antibody would make the protein a high performing antigen in the passive immunity assay.

From the foregoing results, it is seen that this invention provides a family of immunogenic S. pneumoniae surface antigens that will be useful for immunization of subjects, including humans, for eliciting an anti-S. pneumoniae antibody response. Optimization of dosing and boosting schedules according to well known practices will result in immunogenic compositions effective as vaccines for inducing protective immunity against S. pneumoniae. Moreover, because the surface antigens disclosed herein are in-common to all or nearly all strains of S. pneumoniae, and because they all exhibit a very high degree of sequence invariability, as expressed by percent amino acid sequence pairwise homology across at least 123 different S. pneumoniae strains, the antigens of the present invention are expected to provide universal immunogens, from which breakthrough will be extremely rare if not unknown over time.

The present invention provides a means for determining and selecting S. pneumoniae surface antigens that are universal vaccine candidates by virtue of their essentially universal presence across all S. pneumoniae genomes and their high degree of amino acid sequence conservation among S. pneumoniae strains. While the particular surface antigens identified herein fit the rigorous selection criteria for a universal S. pneumoniae vaccine, a few antigens have been identified previously using alternative techniques such as antigenomic screening which qualify them as potential vaccine candidates without determination of their universal presence across pneumococcal genomes or their level of sequence homology. See, e.g., Giefing et al., J. Exp. Med., 205(1):117-131 (2008). In such instances where the present disclosure represents a rediscovery of highly immunogenic proteins that are disclosed herein to be, additionally, universally present and invariant, the proteins have been eliminated from the claims.

Additional embodiments of the invention and alternative methods adapted to a particular composition will be evident from studying the foregoing description. All such embodiments and obvious alternatives are intended to be within the scope of this invention, as defined by the claims that follow. The various publications, patents, and references cited in the foregoing description are incorporated herein by reference in their entirety. 

1. An immunogenic composition comprising: (a) at least one cold spot polypeptide or an immunogenic fragment thereof, which polypeptide comprises at least a portion of an extracellular domain of a Streptococcus pneumoniae surface protein, wherein said protein: (i) is in-common to all or nearly all strains of Streptococcus pneumoniae, and (ii) has at least a 98% average amino acid sequence pairwise homology among such strains of Streptococcus pneumoniae; (b) a pharmaceutically acceptable vehicle or excipient; and (c) optionally, an adjuvant.
 2. The immunogenic composition of claim 1, wherein said cold spot polypeptide has an average amino acid sequence pairwise homology of at least 99%.
 3. The immunogenic composition of claim 1, wherein said cold spot polypeptide has an average amino acid sequence pairwise homology of at least 99.5%.
 4. The immunogenic composition of claim 1, wherein said cold spot polypeptide has an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-273 and natural allelic variants thereof.
 5. The immunogenic composition of claim 1, wherein said cold spot polypeptide has an amino acid sequence selected from the group consisting of SEQ ID NOs: 336-341 and corresponding portions of the natural allelic variants of SEQ ID NOs: 1-6 as set forth in Table 1B.
 6. The immunogenic composition of claim 1, wherein said cold spot polypeptide is selected from the group set forth in the following table: Cold Spot Polypeptide (GenBank Designation in protein length TIGR4 Genomic Sequence) SEQ ID NO: (aa) SP_2207 SEQ ID NO: 274 220 SP_2223 SEQ ID NO: 276 276 SP_2226 SEQ ID NO: 277 122 SP_0005 SEQ ID NO: 278 189 SP_0007 SEQ ID NO: 279 88 SP_0008 SEQ ID NO: 280 122 SP_0010 SEQ ID NO: 281 422 SP_0016 SEQ ID NO: 283 112 SP_0020 SEQ ID NO: 284 155 SP_0021 SEQ ID NO: 285 147 SP_0024 SEQ ID NO: 286 165 SP_0030 SEQ ID NO: 287 108 SP_0037 SEQ ID NO: 288 330 SP_0048 SEQ ID NO: 291 181 SP_0050 SEQ ID NO: 292 515 SP_0053 SEQ ID NO: 293 162 SP_0057 SEQ ID NO: 294 1,312 SP_1847 SEQ ID NO: 297 193 SP_1850 SEQ ID NO: 298 254 SP_1855 SEQ ID NO: 299 345 SP_1865 SEQ ID NO: 300 354 SP_1897 SEQ ID NO: 301 419 SP_1923 SEQ ID NO: 305 471 SP_1937 SEQ ID NO: 306 318 SP_1948 SEQ ID NO: 308 74 SP_1949 SEQ ID NO: 309 62 SP_1954 SEQ ID NO: 310 467 SP_1955 SEQ ID NO: 311 103 SP_1980 SEQ ID NO: 312 308 SP_1990 SEQ ID NO: 313 186 SP_1991 SEQ ID NO: 314 257 SP_1994 SEQ ID NO: 315 404 SP_1997 SEQ ID NO: 316 462 SP_2010 SEQ ID NO: 318 731

and natural allelic variants thereof.
 7. A method of eliciting an immune response in a mammal, said method comprising administering to said mammal an immunogenic composition according to claim
 1. 8. The method of claim 7, wherein said mammal is a human.
 9. An immunogenic composition comprising two or more immunogenic compositions according to claim
 1. 10. A method of immunizing a subject against S. pneumoniae infection, said method comprising administering at least one immunogenic composition according to claim 1 to a subject in an amount or for a number of administrations sufficient to elicit an immune response including antibodies recognizing a majority of S. pneumoniae capsular serotypes.
 11. The method according to claim 10, wherein said administration elicits an immune response including antibodies recognizing at least 80 S. pneumoniae capsular serotypes.
 12. The method according to claim 10, wherein said administration elicits an immune response including antibodies recognizing at least 90 S. pneumoniae capsular serotypes.
 13. A method of making an immunogenic composition for raising an immune response producing antibodies reactive against at least 50 different serotypes of S. pneumoniae, said method comprising: (1) selecting one or more cold spot surface antigens of S. pneumoniae, (2) isolating one or more polypeptide segments from an extracellular domain of the one or more cold spot surface antigens selected in (1), (3) formulating the isolated polypeptide segments obtained in (2) by admixing said one or more isolated polypeptide segments with a pharmaceutically acceptable carrier, to produce an immunogenic composition for inoculating human subjects against S. pneumonia infection.
 14. The method according to claim 13, wherein said immunogenic composition includes one or more adjuvants.
 15. The method according to claim 13, wherein said immunogenic composition is effective for raising an immune response producing antibodies reactive with at least 80 S. pneumonia serotypes.
 16. The method according to claim 13, wherein said immunogenic composition is effective for raising an immune response producing antibodies reactive with at least 90 S. pneumonia serotypes.
 17. The method according to claim 13, wherein said immunogenic composition is effective for raising an immune response producing antibodies reactive with at least 93 S. pneumonia serotypes.
 18. The method according to claim 13, wherein, in step (1), one, two, three, four, or five cold spot surface antigens are selected.
 19. The method according to claim 13, wherein, in step (1), one or two cold spot surface antigens are selected.
 20. The method according to claim 13, wherein, in step (1), one cold spot surface antigen is selected. 